[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Yuanfang Chen via Phabricator via cfe-commits Wed, 07 Jul 2021 11:21:51 -0700

ychen added a comment.

In D97915#2861178 <https://reviews.llvm.org/D97915#2861178>, @ChuanqiXu wrote:


> In D97915#2861036 <https://reviews.llvm.org/D97915#2861036>, @ychen wrote:
>
>> In D97915#2860984 <https://reviews.llvm.org/D97915#2860984>, @ChuanqiXu 
>> wrote:
>>
>>> In D97915#2860916 <https://reviews.llvm.org/D97915#2860916>, @ychen wrote:
>>>
>>>> In D97915#2859237 <https://reviews.llvm.org/D97915#2859237>, @ChuanqiXu 
>>>> wrote:
>>>>
>>>>> In D97915#2848816 <https://reviews.llvm.org/D97915#2848816>, @ychen wrote:
>>>>>
>>>>>>> Thanks for clarifying. Let's solve the semantics problem first.
>>>>>>> With the introduction about 'raw frame', I think it's necessary to 
>>>>>>> introduce this concept in the section 'Switched-Resume Lowering' or 
>>>>>>> even the section 'Introduction' in the document. Add a section to tell 
>>>>>>> the terminology is satisfied too.
>>>>>>
>>>>>> Done.
>>>>>>
>>>>>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>>>>>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the 
>>>>>>> same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are 
>>>>>>> trying to solve the problem about memory leak. But I think we could use 
>>>>>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
>>>>>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
>>>>>>> omit a field in the frame to save space.
>>>>>>
>>>>>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame 
>>>>>> address instead of raw frame pointer)
>>>>>>
>>>>>> Apologies for the confusion. I've briefly explained it here 
>>>>>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
>>>>>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a 
>>>>>> coroutine frame field storing the `raw frame pointer`" only after 
>>>>>> `insertSpills` in CoroFrame.cpp. Before that, 
>>>>>> "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the `raw 
>>>>>> frame pointer` (try grepping "alloc.frame.ptr" in this review page). 
>>>>>> Using  "llvm.coro.raw.frame.ptr.offset" instead of  
>>>>>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
>>>>>> check line 31. The downside is that the write to coroutine frame is not 
>>>>>> through an alloca but a direct write. It is unusual because all fields 
>>>>>> in the frame are stored as 1. special/header fields 2. alloca 3. 
>>>>>> splills. Doing the write indirectly as Alloca makes me comfortable. The 
>>>>>> tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do 
>>>>>> you think?
>>>>>>
>>>>>>   19 coro.alloc.align:                                 ; preds = 
>>>>>> %coro.alloc.check.align
>>>>>>   20   %3 = sub nsw i64 64, 16
>>>>>>   21   %4 = add i64 128, %3
>>>>>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>>>>>   23   %mask = sub i64 64, 1
>>>>>>   24   %intptr = ptrtoint i8* %call1 to i64
>>>>>>   25   %over_boundary = add i64 %intptr, %mask
>>>>>>   26   %inverted_mask = xor i64 %mask, -1
>>>>>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>>>>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>>>>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>>>>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, 
>>>>>> i64 64) ]
>>>>>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8                  
>>>>>>    
>>>>>>   
>>>>>>        ; Replace line 31 with below, and must makes sure line 46~line 48 
>>>>>> is skipped.
>>>>>>        ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>>>>>        ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 
>>>>>> %poff
>>>>>>        ; %addr1 = bitcast i8* %addr to i8**
>>>>>>        ; store i8* %call1, i8** %addr1, align 8
>>>>>>   
>>>>>>   
>>>>>>   32   br label %coro.init.from.coro.alloc.align
>>>>>>   33
>>>>>>   34 coro.init.from.coro.alloc.align:                  ; preds = 
>>>>>> %coro.alloc.align
>>>>>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
>>>>>> %coro.alloc.align ]
>>>>>>   36   br label %coro.init
>>>>>>   37
>>>>>>   38 coro.init:                                        ; preds = 
>>>>>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>>>>>      o.init.from.coro.alloc
>>>>>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
>>>>>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>>>>>      .coro.init, %coro.init.from.coro.alloc.align ]
>>>>>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>>>>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>>>>> %FramePtr, i32 0, i32 0
>>>>>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
>>>>>> %resume.addr, align 8
>>>>>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
>>>>>> (%f0.Frame*)* @f0.cleanup
>>>>>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>>>>> %FramePtr, i32 0, i32 1
>>>>>>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, 
>>>>>> align 8
>>>>>>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 
>>>>>> 0, i32 2
>>>>>>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>>>>>>   48   store i8* %8, i8** %7, align 8
>>>>>>   49   br label %AllocaSpillBB
>>>>>>   50
>>>>>>   51 AllocaSpillBB:                                    ; preds = 
>>>>>> %coro.init
>>>>>>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>>>>> %FramePtr, i32 0, i32 4
>>>>>>   53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, 
>>>>>> %f0.Frame* %FramePtr, i32 0, i32 5
>>>>>>   54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, 
>>>>>> %f0.Frame* %FramePtr, i32 0, i32 6
>>>>>>   55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, 
>>>>>> %f0.Frame* %FramePtr, i32 0, i32 7
>>>>>>   56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, 
>>>>>> %f0.Frame* %FramePtr, i32 0, i32 8
>>>>>>   57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, 
>>>>>> %f0.Frame* %FramePtr, i32 0, i32 10
>>>>>>   58   br label %PostSpill
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Then I am a little confused for the design again, since we would treat 
>>>>>>> the value for CoroBegin as the address of coroutine frame in the past 
>>>>>>> and it looks like to be the raw frame now. Let me reconsider if it is 
>>>>>>> OK.
>>>>>>
>>>>>> The returned value of CoroBegin is still coroutine frame not a raw frame 
>>>>>> even if the frame is overaligned. You could check the above code.
>>>>>
>>>>> Thanks for clarifying!
>>>>>
>>>>> I don't understand why we need to store the address for coroutine raw 
>>>>> frame in the coroutine frame. For example, `%call1` in your example marks 
>>>>> the address for the raw frame. Then can we use the value `%call1` in 
>>>>> every place where we want to use the address for coroutine frame?
>>>>> If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' 
>>>>> in the frontend if we need to use the address for the raw frame. Then in 
>>>>> the middle end, we could replace `llvm.coro.raw.frame` with `%call1` 
>>>>> simply. Similarly, we could define intrinsic `llvm.coro.raw.frame.size`. 
>>>>> As far as I know from the codes, the address for the coroutine frame is 
>>>>> mainly used for deallocation. So it should be fine I guess.
>>>>>
>>>>> ---
>>>>>
>>>>> Then the code generated now looks roughly like:
>>>>>
>>>>>   if (should over align) {
>>>>>      /// ...
>>>>>      mem = ...
>>>>>   } else {
>>>>>      /// ...
>>>>>      mem = ...
>>>>>   }
>>>>>   coro.begin(id, mem);
>>>>>
>>>>> It looks redundant since the `then` part and `else` part looks  very 
>>>>> similar. I understand it would be eliminated in the middle end. But 
>>>>> another problem is that the redundant implementation in clang. Maybe we 
>>>>> could solve it by refactoring.
>>>>> But I am wondering if it is possible to use another pattern (assume 
>>>>> `llvm.coro.alloc` returns true):
>>>>>
>>>>>   %raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
>>>>>   %true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we 
>>>>> need a better name
>>>>>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>>>>>
>>>>> Then for `llvm.coro.frame`, we could return `@raw.frame.ptr ` simply if 
>>>>> the alignment could be satisfied (alignment needed is less than 
>>>>> NEW_ALIGN). Or we could do simply to align up for the coroutine frame. 
>>>>> There are many APIs in Align.h.
>>>>> And for the destruction, we could emit:
>>>>>
>>>>>   call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size()) 
>>>>>
>>>>> In this way, I guess we would get simpler implementation and generated 
>>>>> codes.
>>>>>
>>>>> BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and 
>>>>> llvm.coro.size would change slightly. They would stands for the address 
>>>>> and size for the coroutine frame if we don't need over alignment.
>>>>>
>>>>> How do you think about this?
>>>>
>>>> I was confused by this and @rjmccall explained it here 
>>>> https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not 
>>>> recover "raw frame pointer" (`%call1`) from coroutine frame pointer 
>>>> statically at deallocation time.
>>>
>>> Oh, I understand why we need to store the address for raw frame now. 
>>> Another question is that how do you think combine the pattern:
>>>
>>>   if (should over align) {
>>>      /// ...
>>>      mem = ...
>>>   } else {
>>>      /// ...
>>>      mem = ...
>>>   }
>>>   coro.begin(id, mem);
>>>
>>> into this one:
>>>
>>>   %true.frame.ptr = call @llvm.coro.create.frame(new(call 
>>> @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
>>>                                                                             
>>>                                                                             
>>>                    ; It would be lowered to store the address of the raw 
>>> frame to the alloca in the middle end if needed
>>>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>>>
>>> and this one:
>>>
>>>   call @delete(call @llvm.coro.raw.frame.ptr(), call 
>>> @llvm.coro.raw.frame.size())                                         ; Then 
>>> use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly 
>>> whenever we want.
>>>
>>> It looks like we could generate the same code in the front for normal and 
>>> over aligned coroutines.
>>
>> Yeah, I think it works for this patch alone. It shifts the semantic lowering 
>> from Clang to LLVM but does not perform less work. For future language 
>> support like D102147 <https://reviews.llvm.org/D102147>, 
>> @llvm.coro.create.frame needs to be repurposed based on the new semantics 
>> and that seems a sign that it should be implemented in frontend.
>
> For language support, `::operator new(size_t, align_t)`, I think it could be 
> implemented like:
>
>   %allocated = call @new(call @llvm.coro.raw.frame.size(), align_val)
>   %true.frame.ptr = call @llvm.coro.create.frame(%allocated, 0) ; if the 
> second argument is 0, it means `llvm.coro.create.frame` could be lowered to 
> `%allocated` simply.
>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>
> It looks not hard to implement. And we don't need to refactor the CodeGen 
> part a lot. In this way, I think the main effort to support `::operator 
> new(size_t, align_t)` would be in the Sema part and the works remained in 
> CodeGen part would be little. It wouldn't touch the middle end part neither.

I agree that something like this is simpler implementation-wise. Although in 
spirit, it is similar to D100739 <https://reviews.llvm.org/D100739> which we 
decided not to pursue. What is the middle-end? is it LLVM?

>> It shifts the semantic lowering from Clang to LLVM but does not perform less 
>> work.
>
> I think it would be simpler.

I agree it would be simpler.

> At least, we don't need to emit `getReturnStmtOnAllocFailure` twice and

`getReturnStmtOnAllocFailure` is called twice but the code is emitted once.

> we don't need to touch `CallCoroDelete`  neither.

If we don't touch `CallCoroDelete`, then we cannot do the equivalent work in 
LLVM since there is no concept of deallocation function concept there.

> And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

> And we could emit simpler AlignupTo (Although it could be simplified further, 
> I believe).

D102147 <https://reviews.llvm.org/D102147> has a version of it.

> And the extra work we need to do is to compare the alignment requirement for 
> the coroutine frame with the second argument of `llvm.coro.create.frame` to 
> see if we need to over align coroutine frame.
> If yes, we need to lower the `llvm.coro.create.frame` to compute the true 
> address for the coroutine frame and store the raw frame address.
> If no, we could return `%allocated` simply.

Yes, it is similar to CoroFrame.cpp changes in D102147 
<https://reviews.llvm.org/D102147>.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Reply via email to