> On Nov 1, 2016, at 3:32 PM, John McCall <rjmcc...@apple.com> wrote: > >> On Oct 31, 2016, at 12:22 PM, Joe Groff via swift-dev <swift-dev@swift.org> >> wrote: >> We currently abstract over mutable property accesses using what I’ll call a >> continuation-based model–the materializeForSet accessor is called before an >> inout access, and returns a continuation callback that must be called when >> the inout access is completed. I think a nested-function-based model, if >> designed correctly, has the potential to be more efficient and easier to >> implement. > > Well, I'm not totally convinced about either of those two points, but it's > definitely something we need to talk about. > > Let me try to break this down. I think I'm going to repeat you a lot in > different words. > > We have a coroutine-esque problem: > 0. Outer is executing. > 1. Outer calls Inner. > 2. Inner executes for a while. > 3. Inner passes control back to Outer without abandoning its execution > context. > 4. Outer executes for awhile. > 5. Outer passes control back to Inner, potentially having changed its own > execution context. > 6. Inner executes for awhile. > 7. Inner finally abandons its execution context and passes control back to > Outer. > 8. Outer is executing. > > Note that there might be multiple static points for (3) and (5), each > requiring a different resumption point (and creating different context > changes) at (5) and (7) respectively. > > (I'm going to use the terms "standard spilling", "subfunction spilling", and > "transfer spilling" without defining them. I'll come back to this.) > > I. Lowerings > > Currently, as you say, we do this with a sort of CPS conversion where Inner > is split into multiple sub-functions. The initial function is passed a > (necessarily fixed-size, opaque to Outer) buffer; at (3), it saves its > context into that buffer and returns a sub-function that Outer will invoke > (passing the same buffer) at point (5); at (5), that sub-function restores > its context from that buffer; and finally the sub-function returns normally > at (7). This has two advantages: Outer can reason locally about the data > flow from (0) to (4) to (8), and Inner can return a null pointer at (3) to > indicate that there's no work to do in (6) and thus save an indirect branch > (at the cost of a local conditional branch). However, if Inner has > interesting context to save at (3), that requires extra code to manage that > data flow, and the amount of data to save can overflow the buffer and > therefore require malloc. Total branching cost: the original call at (1), > the easily-predicted return at (3), the conditional branch at (5), the > possible indirect call at (5), and the easily-predicted return at (7). Total > save/restore cost: standard spilling at (1-3) and (5-7), transfer spilling at > (3-5), possible malloc. > > The simplest alternative proposal is a callback conversion, extracting (4) > from Outer as a sub-function which is passed to Inner. Inner now gets an > ordinary stack frame to save its context, so its data flow and control flow > are easier for the backend to reason about, and malloc is not required. > However, the data/control flow in Outer is significantly obscured; most > importantly, different static points for (5) may need to resume to different > instructions at (7) (which the continuation lowering implicitly handles by > duplicating the call in (5)), and Outer's context will be modified at various > points from (1-7). Total branching cost: the original call at (1), the > indirect call at (3), the easily-predicted return at (5), the > easily-predicted return at (7), the possible switch-out at (7). Total > save/restore cost: subfunction spilling at (1-7), standard spilling at (3-5), > only stack allocations. > > Note that the call at (5) needs to duplicated for continuation lowering in > exactly those cases where we need a switch-out at (7) in callback conversion. > > Your observation is that, if Outer needs to perform a nested access with this > sequence at (4), a naive callback conversion will have a lot of branches. > Let the outer access be (A), with its steps being (A.0-8), and the inner > access be (B), with its steps being (B.0-8), so we have (A.4) := (B.0-8). If > the nesting is simple — i.e. there is no code in B.0 or B.8 — and the > initiation sequences are compatible — i.e. you can convince (A.3) to directly > make the original call for (B.1) — then it is unnecessary to extract a > sub-function for (A.4), and instead (A.1) can just set up the chain of calls. > Total branching cost: the original call at (A.1), the indirect call at (A.3 > / B.1), the indirect call at (B.3), the easily-predicted return at (B.5), the > easily-predicted return at (B.7 / A.5), the easily-predicted return at (A.7), > the possible switch-out at (A.7). Total save/restore cost: subfunction > spilling at (A.1-7), standard spilling at (A.3-5), standard spilling at > (B.3-5), only stack allocations. But it does have some additional set-up > overhead, and it expects a lot from the indirect call at (3); in particular, > to get this to work well in practice, we really have to change all > entrypoints for (1) so that they're callable in exactly the same way, i.e. > taking a context argument, a "next callback" pointer, and an array of extra > argument data for the types, indices, whatever. > > II. Spilling > > "Standard spilling" means ordinary compiler behavior to save values across a > call; it includes putting values in callee-save registers. The compiler has > a lot of intelligence about this. > > "Subfunction spilling" is like standard spilling except the values must be > accessible to a sub-function. The compiler has some very limited > intelligence for this; I'm confident that that intelligence can grow over > time, but it's not much right now. This precludes using callee-save > registers unless, like you suggest, we promise to preserve them when making > the callback; and note that "preserve" here really means "expect our caller > and our callee to have a private contract about these registers that we > shouldn't disturb" — in particular, our caller might set the register, our > callee might expect to see that value *and change it*, and our caller might > expect to see that changed value. This is completely implementable, but it's > not how the backend is really engineered right now. > > "Transfer spelling" means saving values across a shift in execution contexts, > given only a buffer. The compiler is not well-engineered for this; we'd > probably always be lowering to sub-functions early on, and doing so will > block a lot of value optimizations in LLVM. > > III. Implementation > > I think SIL should clearly work with structured, unified Outer and Inner > functions here, freely able to move operations between them without having to > worry about what goes where. That means that we have to do some sort of > lowering post-SIL (or at least in some low-level late lowering, rather than > in SILGen). I expect that that alone will greatly improve our current > implementation. > > In order to do this late lowering, we'll need some structural restrictions on > SIL that should be in line with many of the other restrictions we're imposing > with Semantic SIL. Given that, I don't think that it's particularly harder > to extract a continuation vs. extracting an inner sub-function. > > IV. Summary > > So to sum up: > > Our current continuation-style lowering is good when: > - there's a lot of data flow from 0 to 4 or from 4 to 8, > - there's complex control flow from 4 to 8, or > - the inner function has no code in 6, which is common for stored/addressed > accesses. > > A callback-style lowering is good when: > - there's a lot of data flow from 2 to 6 or > - a zero-allocation implementation is required. > > A chained callback-style lowering is good when: > - callback-style lowerings are good and > - simply nested accesses are common enough to pay for the slight additional > overhead. > > Honestly, I have a hard time seeing the callback trade-offs as being the > right ones in general. Removing the malloc is required for true systems > programming, but when that guarantee isn't needed, I think we can provide a > large enough buffer to make allocation basically a non-factor.
OK, if you're not convinced that chained callbacks are a win, then it's kind of a wash. I agree that we'd want to represent the caller and callee as independent coroutines in SIL either way, so the tradeoffs come down to how we lower to readily-available LLVM abstractions. Either implementation approach's tradeoffs could be mitigated with enough implementation effort: We could avoid mallocing in the continuation-style implementation by manipulating the stack pointer across calls to save space for extra state, and on the other hand, for the callback-style lowering, we could increase the 0-to-4-to-8 bandwidth by saving more registers when switching into the callee's context. One possibly small advantage I think the callback style still holds is the ability for (3) to be a tail call when the callee has no code in 6 (the stored/addressed case), which saves a test and branch the caller needs to perform if we represent that as a nullable continuation return in the continuation-style model. -Joe _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev