Re: [swift-dev] Continuation-based versus nested-function-based materializeForSet

Joe Groff via swift-dev Wed, 02 Nov 2016 14:54:48 -0700

> On Nov 1, 2016, at 3:32 PM, John McCall <rjmcc...@apple.com> wrote:
> 
>> On Oct 31, 2016, at 12:22 PM, Joe Groff via swift-dev <swift-dev@swift.org> 
>> wrote:
>> We currently abstract over mutable property accesses using what I’ll call a 
>> continuation-based model–the materializeForSet accessor is called before an 
>> inout access, and returns a continuation callback that must be called when 
>> the inout access is completed. I think a nested-function-based model, if 
>> designed correctly, has the potential to be more efficient and easier to 
>> implement.
> 
> Well, I'm not totally convinced about either of those two points, but it's 
> definitely something we need to talk about.
> 
> Let me try to break this down.  I think I'm going to repeat you a lot in 
> different words.
> 
> We have a coroutine-esque problem:
>  0. Outer is executing.
>  1. Outer calls Inner.
>  2. Inner executes for a while.
>  3. Inner passes control back to Outer without abandoning its execution 
> context.
>  4. Outer executes for awhile.
>  5. Outer passes control back to Inner, potentially having changed its own 
> execution context.
>  6. Inner executes for awhile.
>  7. Inner finally abandons its execution context and passes control back to 
> Outer.
>  8. Outer is executing.
> 
> Note that there might be multiple static points for (3) and (5), each 
> requiring a different resumption point (and creating different context 
> changes) at (5) and (7) respectively.
> 
> (I'm going to use the terms "standard spilling", "subfunction spilling", and 
> "transfer spilling" without defining them.  I'll come back to this.)
> 
> I. Lowerings
> 
> Currently, as you say, we do this with a sort of CPS conversion where Inner 
> is split into multiple sub-functions.  The initial function is passed a 
> (necessarily fixed-size, opaque to Outer) buffer; at (3), it saves its 
> context into that buffer and returns a sub-function that Outer will invoke 
> (passing the same buffer) at point (5); at (5), that sub-function restores 
> its context from that buffer; and finally the sub-function returns normally 
> at (7).  This has two advantages: Outer can reason locally about the data 
> flow from (0) to (4) to (8), and Inner can return a null pointer at (3) to 
> indicate that there's no work to do in (6) and thus save an indirect branch 
> (at the cost of a local conditional branch).  However, if Inner has 
> interesting context to save at (3), that requires extra code to manage that 
> data flow, and the amount of data to save can overflow the buffer and 
> therefore require malloc.  Total branching cost: the original call at (1), 
> the easily-predicted return at (3), the conditional branch at (5), the 
> possible indirect call at (5), and the easily-predicted return at (7).  Total 
> save/restore cost: standard spilling at (1-3) and (5-7), transfer spilling at 
> (3-5), possible malloc.
> 
> The simplest alternative proposal is a callback conversion, extracting (4) 
> from Outer as a sub-function which is passed to Inner.  Inner now gets an 
> ordinary stack frame to save its context, so its data flow and control flow 
> are easier for the backend to reason about, and malloc is not required.  
> However, the data/control flow in Outer is significantly obscured; most 
> importantly, different static points for (5) may need to resume to different 
> instructions at (7) (which the continuation lowering implicitly handles by 
> duplicating the call in (5)), and Outer's context will be modified at various 
> points from (1-7).  Total branching cost: the original call at (1), the 
> indirect call at (3), the easily-predicted return at (5), the 
> easily-predicted return at (7), the possible switch-out at (7).  Total 
> save/restore cost: subfunction spilling at (1-7), standard spilling at (3-5), 
> only stack allocations.
> 
> Note that the call at (5) needs to duplicated for continuation lowering in 
> exactly those cases where we need a switch-out at (7) in callback conversion.
> 
> Your observation is that, if Outer needs to perform a nested access with this 
> sequence at (4), a naive callback conversion will have a lot of branches.  
> Let the outer access be (A), with its steps being (A.0-8), and the inner 
> access be (B), with its steps being (B.0-8), so we have (A.4) := (B.0-8).  If 
> the nesting is simple — i.e. there is no code in B.0 or B.8 — and the 
> initiation sequences are compatible — i.e. you can convince (A.3) to directly 
> make the original call for (B.1) — then it is unnecessary to extract a 
> sub-function for (A.4), and instead (A.1) can just set up the chain of calls. 
>  Total branching cost: the original call at (A.1), the indirect call at (A.3 
> / B.1), the indirect call at (B.3), the easily-predicted return at (B.5), the 
> easily-predicted return at (B.7 / A.5), the easily-predicted return at (A.7), 
> the possible switch-out at (A.7).  Total save/restore cost: subfunction 
> spilling at (A.1-7), standard spilling at (A.3-5), standard spilling at 
> (B.3-5), only stack allocations.  But it does have some additional set-up 
> overhead, and it expects a lot from the indirect call at (3); in particular, 
> to get this to work well in practice, we really have to change all 
> entrypoints for (1) so that they're callable in exactly the same way, i.e. 
> taking a context argument, a "next callback" pointer, and an array of extra 
> argument data for the types, indices, whatever.
> 
> II. Spilling
> 
> "Standard spilling" means ordinary compiler behavior to save values across a 
> call; it includes putting values in callee-save registers.  The compiler has 
> a lot of intelligence about this.
> 
> "Subfunction spilling" is like standard spilling except the values must be 
> accessible to a sub-function.  The compiler has some very limited 
> intelligence for this; I'm confident that that intelligence can grow over 
> time, but it's not much right now.  This precludes using callee-save 
> registers unless, like you suggest, we promise to preserve them when making 
> the callback; and note that "preserve" here really means "expect our caller 
> and our callee to have a private contract about these registers that we 
> shouldn't disturb" — in particular, our caller might set the register, our 
> callee might expect to see that value *and change it*, and our caller might 
> expect to see that changed value.  This is completely implementable, but it's 
> not how the backend is really engineered right now.
> 
> "Transfer spelling" means saving values across a shift in execution contexts, 
> given only a buffer.  The compiler is not well-engineered for this; we'd 
> probably always be lowering to sub-functions early on, and doing so will 
> block a lot of value optimizations in LLVM.
> 
> III. Implementation
> 
> I think SIL should clearly work with structured, unified Outer and Inner 
> functions here, freely able to move operations between them without having to 
> worry about what goes where.  That means that we have to do some sort of 
> lowering post-SIL (or at least in some low-level late lowering, rather than 
> in SILGen).  I expect that that alone will greatly improve our current 
> implementation.
> 
> In order to do this late lowering, we'll need some structural restrictions on 
> SIL that should be in line with many of the other restrictions we're imposing 
> with Semantic SIL.  Given that, I don't think that it's particularly harder 
> to extract a continuation vs. extracting an inner sub-function.
> 
> IV. Summary
> 
> So to sum up:
> 
> Our current continuation-style lowering is good when:
>  - there's a lot of data flow from 0 to 4 or from 4 to 8,
>  - there's complex control flow from 4 to 8, or
>  - the inner function has no code in 6, which is common for stored/addressed 
> accesses.
> 
> A callback-style lowering is good when:
>  - there's a lot of data flow from 2 to 6 or
>  - a zero-allocation implementation is required.
> 
> A chained callback-style lowering is good when:
>  - callback-style lowerings are good and
>  - simply nested accesses are common enough to pay for the slight additional 
> overhead.
> 
> Honestly, I have a hard time seeing the callback trade-offs as being the 
> right ones in general.  Removing the malloc is required for true systems 
> programming, but when that guarantee isn't needed, I think we can provide a 
> large enough buffer to make allocation basically a non-factor.


OK, if you're not convinced that chained callbacks are a win, then it's kind of 
a wash. I agree that we'd want to represent the caller and callee as 
independent coroutines in SIL either way, so the tradeoffs come down to how we 
lower to readily-available LLVM abstractions. Either implementation approach's 
tradeoffs could be mitigated with enough implementation effort: We could avoid 
mallocing in the continuation-style implementation by manipulating the stack 
pointer across calls to save space for extra state, and on the other hand, for 
the callback-style lowering, we could increase the 0-to-4-to-8 bandwidth by 
saving more registers when switching into the callee's context. One possibly 
small advantage I think the callback style still holds is the ability for (3) 
to be a tail call when the callee has no code in 6 (the stored/addressed case), 
which saves a test and branch the caller needs to perform if we represent that 
as a nullable continuation return in the continuation-style model.

-Joe
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] Continuation-based versus nested-function-based materializeForSet

Reply via email to