Re: [swift-dev] Continuation-based versus nested-function-based materializeForSet

John McCall via swift-dev Tue, 01 Nov 2016 15:32:56 -0700

> On Oct 31, 2016, at 12:22 PM, Joe Groff via swift-dev <[email protected]> 
> wrote:
> We currently abstract over mutable property accesses using what I’ll call a 
> continuation-based model–the materializeForSet accessor is called before an 
> inout access, and returns a continuation callback that must be called when 
> the inout access is completed. I think a nested-function-based model, if 
> designed correctly, has the potential to be more efficient and easier to 
> implement.


Well, I'm not totally convinced about either of those two points, but it's 
definitely something we need to talk about.

Let me try to break this down.  I think I'm going to repeat you a lot in 
different words.

We have a coroutine-esque problem:
  0. Outer is executing.
  1. Outer calls Inner.
  2. Inner executes for a while.
  3. Inner passes control back to Outer without abandoning its execution 
context.
  4. Outer executes for awhile.
  5. Outer passes control back to Inner, potentially having changed its own 
execution context.
  6. Inner executes for awhile.
  7. Inner finally abandons its execution context and passes control back to 
Outer.
  8. Outer is executing.

Note that there might be multiple static points for (3) and (5), each requiring 
a different resumption point (and creating different context changes) at (5) 
and (7) respectively.

(I'm going to use the terms "standard spilling", "subfunction spilling", and 
"transfer spilling" without defining them.  I'll come back to this.)

I. Lowerings

Currently, as you say, we do this with a sort of CPS conversion where Inner is 
split into multiple sub-functions.  The initial function is passed a 
(necessarily fixed-size, opaque to Outer) buffer; at (3), it saves its context 
into that buffer and returns a sub-function that Outer will invoke (passing the 
same buffer) at point (5); at (5), that sub-function restores its context from 
that buffer; and finally the sub-function returns normally at (7).  This has 
two advantages: Outer can reason locally about the data flow from (0) to (4) to 
(8), and Inner can return a null pointer at (3) to indicate that there's no 
work to do in (6) and thus save an indirect branch (at the cost of a local 
conditional branch).  However, if Inner has interesting context to save at (3), 
that requires extra code to manage that data flow, and the amount of data to 
save can overflow the buffer and therefore require malloc.  Total branching 
cost: the original call at (1), the easily-predicted return at (3), the 
conditional branch at (5), the possible indirect call at (5), and the 
easily-predicted return at (7).  Total save/restore cost: standard spilling at 
(1-3) and (5-7), transfer spilling at (3-5), possible malloc.

The simplest alternative proposal is a callback conversion, extracting (4) from 
Outer as a sub-function which is passed to Inner.  Inner now gets an ordinary 
stack frame to save its context, so its data flow and control flow are easier 
for the backend to reason about, and malloc is not required.  However, the 
data/control flow in Outer is significantly obscured; most importantly, 
different static points for (5) may need to resume to different instructions at 
(7) (which the continuation lowering implicitly handles by duplicating the call 
in (5)), and Outer's context will be modified at various points from (1-7).  
Total branching cost: the original call at (1), the indirect call at (3), the 
easily-predicted return at (5), the easily-predicted return at (7), the 
possible switch-out at (7).  Total save/restore cost: subfunction spilling at 
(1-7), standard spilling at (3-5), only stack allocations.

Note that the call at (5) needs to duplicated for continuation lowering in 
exactly those cases where we need a switch-out at (7) in callback conversion.

Your observation is that, if Outer needs to perform a nested access with this 
sequence at (4), a naive callback conversion will have a lot of branches.  Let 
the outer access be (A), with its steps being (A.0-8), and the inner access be 
(B), with its steps being (B.0-8), so we have (A.4) := (B.0-8).  If the nesting 
is simple — i.e. there is no code in B.0 or B.8 — and the initiation sequences 
are compatible — i.e. you can convince (A.3) to directly make the original call 
for (B.1) — then it is unnecessary to extract a sub-function for (A.4), and 
instead (A.1) can just set up the chain of calls.  Total branching cost: the 
original call at (A.1), the indirect call at (A.3 / B.1), the indirect call at 
(B.3), the easily-predicted return at (B.5), the easily-predicted return at 
(B.7 / A.5), the easily-predicted return at (A.7), the possible switch-out at 
(A.7).  Total save/restore cost: subfunction spilling at (A.1-7), standard 
spilling at (A.3-5), standard spilling at (B.3-5), only stack allocations.  But 
it does have some additional set-up overhead, and it expects a lot from the 
indirect call at (3); in particular, to get this to work well in practice, we 
really have to change all entrypoints for (1) so that they're callable in 
exactly the same way, i.e. taking a context argument, a "next callback" 
pointer, and an array of extra argument data for the types, indices, whatever.

II. Spilling

"Standard spilling" means ordinary compiler behavior to save values across a 
call; it includes putting values in callee-save registers.  The compiler has a 
lot of intelligence about this.

"Subfunction spilling" is like standard spilling except the values must be 
accessible to a sub-function.  The compiler has some very limited intelligence 
for this; I'm confident that that intelligence can grow over time, but it's not 
much right now.  This precludes using callee-save registers unless, like you 
suggest, we promise to preserve them when making the callback; and note that 
"preserve" here really means "expect our caller and our callee to have a 
private contract about these registers that we shouldn't disturb" — in 
particular, our caller might set the register, our callee might expect to see 
that value *and change it*, and our caller might expect to see that changed 
value.  This is completely implementable, but it's not how the backend is 
really engineered right now.

"Transfer spelling" means saving values across a shift in execution contexts, 
given only a buffer.  The compiler is not well-engineered for this; we'd 
probably always be lowering to sub-functions early on, and doing so will block 
a lot of value optimizations in LLVM.

III. Implementation

I think SIL should clearly work with structured, unified Outer and Inner 
functions here, freely able to move operations between them without having to 
worry about what goes where.  That means that we have to do some sort of 
lowering post-SIL (or at least in some low-level late lowering, rather than in 
SILGen).  I expect that that alone will greatly improve our current 
implementation.

In order to do this late lowering, we'll need some structural restrictions on 
SIL that should be in line with many of the other restrictions we're imposing 
with Semantic SIL.  Given that, I don't think that it's particularly harder to 
extract a continuation vs. extracting an inner sub-function.

IV. Summary

So to sum up:

Our current continuation-style lowering is good when:
  - there's a lot of data flow from 0 to 4 or from 4 to 8,
  - there's complex control flow from 4 to 8, or
  - the inner function has no code in 6, which is common for stored/addressed 
accesses.

A callback-style lowering is good when:
  - there's a lot of data flow from 2 to 6 or
  - a zero-allocation implementation is required.

A chained callback-style lowering is good when:
  - callback-style lowerings are good and
  - simply nested accesses are common enough to pay for the slight additional 
overhead.

Honestly, I have a hard time seeing the callback trade-offs as being the right 
ones in general.  Removing the malloc is required for true systems programming, 
but when that guarantee isn't needed, I think we can provide a large enough 
buffer to make allocation basically a non-factor.

John.
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] Continuation-based versus nested-function-based materializeForSet

Reply via email to