Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
again on x86_64-pc-linux-gnu, only difference is the new test failing
without the patch.

On Jul 13 2022, at 2:54 pm, Michal Jankovic
<michal.jankovi...@gmail.com> wrote:

> Hi Iain,
>  
> thanks for the info. I have some follow-up questions.
>  
> On Jul 12 2022, at 7:11 pm, Iain Sandoe <i...@sandoe.co.uk> wrote:
>  
>> Hi Michal,
>>   
>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>> <michal.jankovi...@gmail.com> wrote:
>>   
>>> One other related thing I would like to investigate is reducing the
>>> number of compiler generated variables in the frame, particularly
>>> _Coro_destroy_fn and _Coro_self_handle.    
>>>   
>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>> _Coro_resume_index and calls _Coro_resume_fn; it should be possible to
>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>> is stored in the frame;
>>   
>> That is a particular point about GCC’s implementation … (it is not
>> neccesarily, or even
>> likely to be the same for other implementations) - see below.
>>   
>> I was intending to do experiment with making the ramp/resume/destroy
>> value a parameter
>> to the actor function so that we would have something like -
>>   
>> ramp calls      actor(frame, 0)
>> resume calls  actor(frame, 1)
>> destroy calls  actor(frame, 2)   
>> - the token values are illustrative, not intended to be a final version.
>>   
>> I think that should allow for more inlining opportunites and possibly
>> a way forward to
>> frame elision (a.k.a halo).
>>   
>>> this would however change the coroutine ABI - I don't know if that's
>>> a problem.
>>   
>> The external ABI for the coroutine is the   
>> resume,
>> destroy pointers   
>> and the promise   
>> and that one can find each of these from the frame pointer.
>>   
>> This was agreed between the interested “vendors” so that one compiler
>> could invoke
>> coroutines built by another.  So I do not think this is so much a
>> useful area to explore.
>>   
>  
> I understand. I still want to try to implement a more light-weight frame
> layout with just one function pointer; would it be possible to merge
> such a change if it was made opt-in via a compiler flag, eg
> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
> with very limited memory, and I do not care about interoperability with
> other compilers there.   
>  
>> Also the intent is that an indirect call through the frame pointer is
>> the most frequent
>> operation so should be the most efficient.    
>>  resume() might be called many times,   
>>  destroy() just once thus it is a cold code path   
>>  - space can be important too - but interoperability was the goal here.
>>   
>>> The _Coro_self_handle should be constructible on-demand from the
>>> frame address.
>>   
>> Yes, and in the header the relevant items are all constexpr - so that
>> should happen in the
>> user’s code.  I elected to have that value in the frame to avoid
>> recreating it each time - I
>> suppose that is a trade-off of one oiptimisation c.f. another …   
>  
> If the handle construction cannot be optimized out, and its thus   
> a tradeoff between frame size and number of instructions, then this
> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>  
> Coming back to this:
>  
>>>> (the other related optimisation is to eliminate frame entries for
>>>> scopes without any suspend
>>>> points - which has the potential to save even more space for code with
>>>> sparse use of co_xxxx)
>  
> This would be nice; although it could encompassed by a more general   
> optimization - eliminate frame entries for all variables which are not
>   
> accessed (directly or via pointer / reference) beyond a suspend point.
> To be fair, I do not know how to get started on such an optimization,
> or if it is even possible to do on the frontend. This would however be
> immensely useful for reducing the frame size taken-up by complicated
> co_await expressions (among other things), for example, if I have a
> composed operation:
>  
> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>  
> Right now, this creates space in the frame for the temporary 'leaf'   
> awaitables, which were already moved into the composed awaitable.
> If the awaitable has an operator co_await that returns the real awaiter,
> the original awaitable is also stored in the frame, even if it   
> is not referenced by the awaiter; another unused object gets stored if
>   
> the .await_transform() customization point was used.
>  
> What are your thoughts on the feasibility / difficulty of implementing
> such an optimization?
>  
> Michal
>  
>>>   
>>> Do you have any advice / opinions on this before I try to implement it?
>>   
>> Hopefully, the notes above help.
>>   
>> I will rebase my latest code changes as soon as I have a chance and
>> put them somewhere
>> for you to look at - basically, these are to try and address the
>> correctness issues we face,
>>   
>> Iain
>>   
>>   
>>>   
>>> Michal
>>>   
>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <i...@sandoe.co.uk> wrote:
>>>   
>>>> Hi Michal,
>>>>   
>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>   
>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>> even if their lifetime does not overlap (they are in distinct
>>>>> scopes). This
>>>>> patch implements overlapping distinct variable scopes in the
>>>>> coroutine frame,
>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>> the size
>>>>> of the frame for larger coroutines significantly, and makes them
>>>>> more usable
>>>>> on systems with limited memory.
>>>>   
>>>> not a review (I will try to take a look at the weekend).
>>>>   
>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>> for doing it.
>>>>   
>>>> (the other related optimisation is to eliminate frame entries for
>>>> scopes without any suspend
>>>> points - which has the potential to save even more space for code with
>>>> sparse use of co_xxxx)
>>>>   
>>>> Iain
>>>>   
>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>> test fails
>>>>> before the patch and succeeds after with no regressions.
>>>>>   
>>>>>   PR c++/105989
>>>>>   
>>>>> gcc/cp/ChangeLog:
>>>>>   
>>>>>   * coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>   (build_local_var_frame_access_expr): New.
>>>>>   (transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>   (coro_make_frame_entry_id): New.
>>>>>   (coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>   (struct local_vars_frame_data): Add orig, field_access_path.
>>>>>   (register_local_var_uses): Generate new frame layout. Create access
>>>>>   paths to vars.
>>>>>   (morph_fn_to_coro): Set new fields in local_vars_frame_data.    
>>>>>   
>>>>> gcc/testsuite/ChangeLog:
>>>>>   
>>>>>   * g++.dg/coroutines/pr105989.C: New test.
>>>>>   
>>>>> <pr105989.patch>
>>>>   
>>>>   
>>   
>>  
> 

Attachment: pr105989.patch
Description: Binary data

Reply via email to