Would it work for the go-capnproto2 RPC system to spawn a new goroutine for each method invocation? Then the user-supplied method implementation code would always get executed on a separate goroutine from the object's main loop, avoiding the deadlock. (There would be no "start the work" phase.) Admittedly, one downside to this approach is that method implementations would no longer get direct access to user-defined object state (also known as "this" or "self"). However, method implementations could probably still get a mutex-protected reference to the object state, and maybe that's good enough.
On Fri, Jul 21, 2017 at 4:52 PM, Ross Light <[email protected]> wrote: > (Sorry for the long response. I did not have time to make it shorter.) > > I see your point about how "a call returns happens before the next call > starts" reaches an undesirable state for applications. I had an inkling > that this could be the case, but hadn't fully experimented to see the > results. However, just on terminology, I'm not sure I agree with your > assessment that objects are single-threaded: because returns can come back > in a different order than the calls arrived, this implies some level of > concurrent (but perhaps not parallel) execution. > > As for your idea of mapping Cap'n Proto methods to messages on Go > channels: it shuffles the problem around a bit but doesn't escape this > deadlock issue. In fact, the first draft I had of the RPC system used a > lot more channels, but I found it made the control flow hard to reason > about (but it could still be implemented this way). Let me give you enough > background on how Go concurrency works so that we're talking with each > other. > > *Background* > > As you already know, goroutines are scheduled on OS threads. When a > goroutine hits a point at which it is blocked on I/O, it will be removed > from the run queue and the OS thread will be reclaimed for another runnable > goroutine. Because the language patches over the typical thread overhead > and makes it easy for the caller to create new goroutines, the practice is > that all functions block (this avoids the function coloring problem > <http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/>, > similar to how every method ought to be a promise in Cap'n Proto RPC). > > There are two flavors of channels: buffered and unbuffered. Buffered can > be thought of as a type-safe and goroutine-safe FIFO. You can place items > onto the channel and they can be received to remove items. But > importantly, if you try to send to a buffered channel that is full, it > blocks until items are received/removed. An unbuffered channel can thus be > thought of as perpetually being in the full state. Most of the time, using > an unbuffered channel is the correct approach as it applies backpressure in > the most predictable way. There was a great talk at Gophercon last week > about Understanding Channels > <https://about.sourcegraph.com/go/understanding-channels-kavya-joshi>, > but the video's not yet online. > > *Problem Statement* > > The reproduction case is alarmingly simple and common. Alice, in Vat A, > receives a call foo() from Bob, in Vat B. Alice has a reference to Bob > (either through the call or prior calls). foo() is a call that does the > following: > > 1. Call bar() on Bob > 2. Wait for the result > 3. Return a value based on computation on the result > > Now let's talk broadly about implementation constraints of the RPC system > bridging Vat A and Vat B. There are at least two concurrent (but > definitely not parallel) processes at play: receiving messages from the > wire and sending messages on the wire. Both of these mutate the set of 4 > (5) tables, so naturally a mutex protects this data structure. When the > receiver process encounters a Call, it must acquire the mutex, start the > work, then release the mutex. The "start the work" piece is what we're > interested in here. > > Let's assume, as you say, that each call comes over an unbuffered > channel. Each capability creates a loop where it receives a new call, > optionally does a bit of processing to "start the work", then sends back a > response. If there's an RPC or longer I/O operation, it spawns a goroutine > that sends back the response so as to avoid blocking that loop. > > Here's where the language difference comes into play and provides a > footgun that C++ does not have: Step 2 of foo(). In the C++ > implementation, long-running work does have a different function "color" > and returns a future, which you should not (and makes it convoluted to) > wait on without starting a concurrent process to finish the rest of the > work. I do have futures in the Go implementation, but instead of exposing > a ".then()" method, they expose a ".wait()" method, since this is much more > idiomatic. In Go, it would be quite easy to accidentally have a > long-running operation be in that "start the work" phase. This is fine if > the long-running work doesn't use RPCs, but if you happen to wait for an > RPC to return, then this deadlocks. The RPC connection will still be > blocked on servicing the "start the work". > > With a channel, you postpone the problem to the next time Bob calls foo() > on Alice. The bad sequence (from very beginning) is: > > 1. Bob calls foo() on Alice > 2. Vat A sends the foo() message to Alice's channel > 3. Alice receives the foo() message > 4. Vat A releases the mutex on the tables and waits for the next message > to come over the wire from Vat B > 5. Alice calls bar() on Bob > 6. Vat A acquires the mutex on the tables, adds the question entry to the > table, writes the Call to Vat B, then releases the mutex. Vat A gives a > future back to Alice. > 7. Alice starts waiting on the future. > 8. Bob calls foo() on Alice again. Call this foo'(). > 9. Vat A attempts to send the foo'() message to Alice's channel. It > blocks, since Alice is not ready to receive from the channel (it's still > processing foo()). > 10. Bob is finished with bar() and attempts to write its return to Vat A. > Vat A is now deadlocked: it can't receive the new wire message because it's > waiting for Alice to receive foo'(), but foo'() cannot be received until > foo() receives bar()'s response. > > *How this surfaces in go-capnproto2* > > As mentioned above, the first few revisions of the Go RPC implementation > used channels in this sort of way under the hood to call methods. However, > it did not wait for call to be acknowledged, which destroys E-Order. I've > since moved to a mutex-based solution (since it reduces the number of > goroutines that the RPC system keeps around and makes it easier to debug), > but fundamentally server.Ack > <https://godoc.org/zombiezen.com/go/capnproto2/server#Ack> is equivalent > to starting a channel receive. > > With the current go-capnproto2 implementation, you can't even safely start > a call before the server.Ack call (which starts the work), since there the > connection's lock is not recursive. I have an idea for how to fix that: > create a temporary sub-mutex that is propagated via the Context. This is > definitely a necessary change. I'm still a bit scared that there could be > deadlock: Vat A calls to Vat B to service a request from Vat C while Vat A > calls to Vat C to service a call from Vat B. (Note that this is not > 3-party, this is just multiple 2-party connections.) I'm not certain about > this however, and would love to be wrong, since the workaround would seem > to be a process-wide mutex on Cap'n Proto RPC work. > > However, there's still the looming footgun here: accidental deadlocks > while starting work. I don't see a way around this in general. And it > bears repeating, this is actually a problem for C++ too, but it is much > more obvious when looking at the code that it's blocking on something it > shouldn't. Whether the Go implementation uses straight-line code or > channels doesn't make too much of a difference here: there's still > straight-line code in the channel-based approach, but the split between the > deferred work and the initial work requires a bit more work for the > application (spawning a goroutine and ensuring a resolve promise method > gets called versus just returning). Channels versus functions come down to > a question of API appearance rather than functional change. > > *Mitigation* > > The only guaranteed safe way to mitigate these issues is to make the Go > capabilities act serially to preserve E-Order. It could be provided in an > opt-out manner, but I'm not even sure how I would write code that avoids > the recursive mutex lock problem. I'm open to other ideas on how to solve > this. > > Thanks for the read, > -Ross > > On Thu, Jul 20, 2017 at 10:25 AM Kenton Varda <[email protected]> > wrote: > >> On Wed, Jul 19, 2017 at 10:26 PM, Ross Light <[email protected]> wrote: >> >>> So in this old thread >>> <https://groups.google.com/d/topic/capnproto/4SLfibQPWFE/discussion>, >>> it's stated that the "call is received" event requires calling into >>> application code. From an implementation standpoint, this is declaring >>> that receiving a call in the RPC system is a critical section that involves >>> crossing over into application code boundary, which may try to acquire the >>> same mutex (by making a call on the connection). While you can postpone >>> this problem by adding queueing, I'm already a little nervous about how >>> much queueing is required by the protocol. >>> >>> I'd like to suggest that the E model be considered: each capability is a >>> separate single queue. Instead of "call A is received happens before call >>> B is received", "call A returns happens before call B starts". The reason >>> this simplifies implementation is that because it prescribes what ought to >>> happen in the critical section (enqueue or throw an overload exception), >>> then no application needs to be invoked in the critical section. This >>> might not be a problem for the C++ implementation right now, but once >>> fibers are involved, I think it would become one. >>> >> >> If I understand what you're proposing correctly, then another way of >> saying it is: "A call must return a result before the next call on the same >> object can begin." >> >> (To be clear, this certainly isn't "the E model". Strictly speaking, in >> E, calls don't "return" anything, but they do eventually resolve a promise, >> and there's no requirement that that resolution happens before the next >> call can proceed.) >> >> I don't think this proposal would work. You're saying that if a method >> call foo() wants to allow other methods to be called before foo() produces >> a result, then foo() must produce a result via a callback rather than via >> return. foo() would need to take, as one of its parameters, a capability >> which it calls with the eventual results. >> >> This would, of course, lead to all the usual "callback hell" problems we >> see in async I/O. Over time, we've reached a consensus that the right way >> to solve "callback hell" is to use promises. Promises allow us to express >> the eventual results of an async function as a return value, which is much >> more natural than using callbacks. Also, it makes it much clearer how >> errors are to be propagated, and makes it harder to accidentally leak >> errors. >> >> So the next logical step would be to introduce a notion of promises into >> Cap'n Proto interface specs. Let methods return promises instead of raw >> values, and then they are free to interleave as necessary. >> >> But then what would happen? Probably, everyone would declare all of their >> methods to return promises, to give themselves flexibility to change their >> implementation in the future if needed. In fact, there'd be even more >> temptation to do this then there is in C++ and Javascript today, because >> the client of a Cap'n Proto interface already has to treat the result as a >> promise for latency and pipelining reasons. So, making a method return a >> promise would create no new inconvenience on the client side (because >> clients already have to deal with promises either way), and it would create >> no inconvenience on the server side (because you can return an >> immediately-resolved promise basically just as easily as returning a >> value). So, everyone would declare every method to return a promise. >> >> The next step, then, would be to say: OK, since everyone is declaring all >> returns to be promises anyway, we're just going to say that it's implied. >> All methods actually return promises. You don't need to declare it. >> >> And then... we'd be back to *exactly* where we are today. >> >> Today, the right way to think about Cap'n Proto methods is to say that >> all methods return promises. >> >> >>> I believe that the same properties can be obtained by pushing this into >>> interface definitions: if an interface really wants to declare that >>> operations can happen in parallel, then there can be a root capability that >>> creates a capability for each operation. Then the RPC subsystem can know >>> much more about how much work is being scheduled. >>> >>> I realize this would be a big change, but I can't see a good way to >>> avoid this problem in any implementation of the RPC protocol that tries to >>> use a connection concurrently. Effectively, this forces all >>> implementations to be single-threaded AFAICT. Let me know what you think. >>> >> >> Cap'n Proto technically only requires that each *object* is >> single-threaded. It's based on the actor model, which is actually similar >> to the CSP model on which Go's concurrency is based. In fact, maybe the >> problem is that we're trying to map Cap'n Proto to the wrong idioms in Go. >> >> Imagine this design: Instead of mapping capnp methods to Go functions, we >> map them to messages on a Go channel. Each object reads from a channel. >> Each message on the channel initiates a call, and specifies a response >> channel to which the call results should be sent when they are ready. >> >> This design would achieve E-Order while allowing overlapping calls and >> being idiomatic with Go's concurrency model. >> >> Thoughts? >> >> ===================== >> >> On another note -- getting away from specific languages or models -- I >> don't think it's practical, in most use cases, to try to utilize multiple >> processor cores to service a single connection. (Note I'm avoiding the word >> "threads" here since languages like Go blur the meaning of "threads" >> between processor cores vs. coroutines. I'm talking about cores, not about >> coroutines.) >> >> The problem is, the connection is implicitly a synchronization point. You >> can't have multiple cores literally reading from the same connection at the >> same time. So you necessarily have to have one core (at a time) servicing >> the connection and then passing data off to other cores. But, the cost of >> synchronization and data transfer between cores is likely to outweigh the >> benefit in most use cases. You'd only get a benefit if one client is making >> multiple CPU-heavy calls concurrently, which isn't all that common. In all >> other cases you'd lose performance. >> >> Instead, I think a better model is to balance whole connections across >> cores. A client can make multiple connections if they want to utilize >> multiple cores. When three-party handoff is implemented, the server will >> even be able to instruct the client to open additional connections, >> transparently to the app. This way the OS kernel actually knows which >> thread any particular message is destined for, and directly schedules that >> thread when a message arrives. No extra context switches! >> >> This approach works (e.g. using Cap'n Proto C++) today. >> >> -Kenton >> > -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > Visit this group at https://groups.google.com/group/capnproto. > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/capnproto.
