[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

Yuanfang Chen via Phabricator via cfe-commits Mon, 26 Apr 2021 19:13:59 -0700

ychen added a comment.

In D100739#2715964 <https://reviews.llvm.org/D100739#2715964>, @ChuanqiXu wrote:


> In D100739#2713579 <https://reviews.llvm.org/D100739#2713579>, @ychen wrote:
>
>> In D100739#2711698 <https://reviews.llvm.org/D100739#2711698>, @ChuanqiXu 
>> wrote:
>>
>>>> This is an alternative to D97915 <https://reviews.llvm.org/D97915> which 
>>>> missed proper deallocation of the over-allocated frame. This patch handles 
>>>> both allocations and deallocations.
>>>
>>> If D97915 <https://reviews.llvm.org/D97915> is not needed, we should 
>>> abandon it.
>>>
>>> For the example shows in D97915 <https://reviews.llvm.org/D97915>, it says:
>>>
>>>   #include <experimental/coroutine>
>>>   #include <iostream>
>>>   #include <stdexcept>
>>>   #include <thread>
>>>   #include <cassert>
>>>   
>>>   struct task{
>>>     struct alignas(64) promise_type {
>>>       task get_return_object() { return {}; }
>>>       std::experimental::suspend_never initial_suspend() { return {}; }
>>>       std::experimental::suspend_never final_suspend() noexcept { return 
>>> {}; }
>>>       void return_void() {}
>>>       void unhandled_exception() {}
>>>     };
>>>     using handle = std::experimental::coroutine_handle<promise_type>;
>>>   };
>>>   
>>>   auto switch_to_new_thread() {
>>>     struct awaitable {
>>>       bool await_ready() { return false; }
>>>       void await_suspend(task::handle h) {
>>>         auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
>>>         std::cout << i << std::endl;
>>>         assert(i % 64 == 0);
>>>       }
>>>       void await_resume() {}
>>>     };
>>>     return awaitable{};
>>>   }
>>>   
>>>   task resuming_on_new_thread() {
>>>     co_await switch_to_new_thread();
>>>   }
>>>   
>>>   int main() {
>>>     resuming_on_new_thread();
>>>   }
>>>
>>> The assertion would fail. If this is the root problem, I think we could 
>>> adjust the align for the promise alloca like:
>>
>> The problem is that any member of the coroutine frame could be overaligned 
>> (thus make the frame overaligned) including promise, local variables, 
>> spills. The problem is *not* specific to promise.
>>
>>>   %promise = alloca %promise_type, align 8
>>>
>>> into
>>>
>>>   %promise = alloca %promise_type, align 128
>>>
>>> In other words, if this the problem we need to solve, I think we could make 
>>> it in a simpler way.
>>
>> This may not fix the problem.
>>
>>> Then I looked into the document you give in the summary. The issue#3 says 
>>> the frontend can't do some work in the process of template instantiation 
>>> due to the frontend doesn't know about the align and size of the coroutine. 
>>> But from the implementation, it looks like not the problem this patch wants 
>>> to solve.
>>
>> I meant to use that as a reference to help describe the problem (but not the 
>> solution). The document itself includes both problem statements (issue#3) 
>> and solutions (frontend-based) which are totally unrelated to this patch. It 
>> looks like it is not that useful in this case so please disregard that.
>>
>>> I am really confused about the problem. Could you please restate your 
>>> problem more in more detail? For example, would it make the alignment 
>>> incorrect like the example above? Or does we want the frontend to get 
>>> alignment information? Then what would be affected? From the title, I can 
>>> guess the size of frame would get bigger. But how big would it be? Who 
>>> would control and determine the final size?
>>
>> understood.
>>
>> There are two kinds of alignments: the alignment of a type/object at 
>> compile-time (ABI specified or user-specified), and the alignment the object 
>> of that type actually gets during runtime. The compiler assumes that the 
>> alignment of a struct is the maximal alignment of all its members. However, 
>> that assumption may not be true at runtime where the memory allocator may 
>> return a memory block that has insufficient alignment which causes some 
>> members aligned incorrectly.
>>
>> For C++ coroutine, right now the default memory allocator could only return 
>> 16 bytes aligned memory block. When any member of the coroutine frame 
>> (promise, local variables, spills etc.) has alignment > 16, the frame 
>> becomes overaligned. This could only be fixed dynamically at runtime: by 
>> over-allocating memory and then adjust the frame start address so that it 
>> aligns correctly.
>>
>> For example, suppose malloc returns 16 bytes aligned address 16, how do we 
>> make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned 
>> which is 64, so the adjustment amount is 64-16=48
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 32, 
>> how do we make it 64 bytes aligned? align 32 up to an address that is 64 
>> bytes aligned which is 64, so the adjustment amount is 64-32=32
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 48, 
>> how do we make it 64 bytes aligned? align 48 up to an address that is 64 
>> bytes aligned which is 64, so the adjustment amount is 64-48=16
>>
>> Another similar example, suppose malloc returns 16 bytes aligned address 64, 
>> how do we make it 64 bytes aligned? align 64 up to an address that is 64 
>> bytes aligned which is 64, so the adjustment amount is 64-64=0
>>
>> So the mamximal adjustment amount is 64-16=48. We don't know until runtime 
>> if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 
>> 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all 
>> cases (by bitwise operations).
>
> Thanks for the explanation. I think I got the problem now. And my 
> understanding for the solution of this patch is, if the align of the original 
> frame is 64, then we allocate (64+48) space  to the new frame now. And the 
> original frame becomes an alloca with align 64 in the new frame. So the 
> actually frame gets the right alignment now. Do I get the problem and 
> solution right?
>
> I am wondering if there is a simpler solution. For example, after we 
> construct the frame, we can get the alignment requirement for the frame. Then 
> if the alignment is bigger than 16, we could lower the `coro.begin` to 
> aligned new instead of default new. It looks like  that the implementation 
> would be much simpler.

Oh, right now C++ coroutine standard is written in the way that the aligned new 
is not searched by the frontend. The limitation will be lifted in C++23 
(hopefully).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

Reply via email to