Re: [capnproto] Some questions about the RPC spec

Thomas Leonard Thu, 20 Jul 2017 13:05:41 -0700

On 20 July 2017 at 19:38, Kenton Varda <[email protected]> wrote:
> On Thu, Jul 20, 2017 at 4:02 AM, Thomas Leonard <[email protected]> wrote:
>>
>> I thought that must be the reason originally, but it seems that
>> takeFromOtherQuestion requires sharing even if it can only be used
>> once, because the struct is held by the original answer (for
>> pipelining) and also by the question that took it.
>
>
> True, but this is a restricted case, and may still allow the implementation
> more freedom than general sharing would. For example, for pipelining
> purposes, technically the implementation only needs to keep the capabilities
> around, along with remembering their pointer paths. It doesn't otherwise
> need to remember the content of the response.


I did start off trying to implement it that way, but then I realised
that questions don't usually hang around long anyway, so it didn't
seem worth the effort.

>> >> If you're implementing level 1 (two-party), then really the only place
>> >> where this applies is when you receive a capability that the receiver
>> >> hosts
>> >> as part of a return or resolve after you have made calls on the
>> >> promised
>> >> capability.  This implies that the RPC system needs to keep track of
>> >> which
>> >> parts of the answer have had calls made on them.  When this occurs, the
>> >> receiver gives the application code an embargoed client, and then sends
>> >> a
>> >> Disembargo with senderLoopback set.  It releases the embargo once the
>> >> same
>> >> disembargo ID is returned with receiverLoopback set.
>>
>> Maybe I got this bit wrong. I attached the "used" flags to the
>> question, but maybe I should be tagging the reference to the question
>> instead. Can different references to the same question need different
>> disembargoes? e.g. should forwarding a message mark the promised
>> answer as needing a disembargo or not?
>
> Sorry, I don't understand your question here.

Maybe it doesn't make sense, or only with my implementation, but it
seems we have two objects for a question/export:

- a proxy that always sends to the remote peer
- a switchable proxy that forwards to the previous object until the
question returns, and then sends to the new target (possibly after a
disembargo)

I was just wondering which proxy should track whether it has been used
(and, therefore, whether it needs a disembargo).

If we had the implementor's guide that was mentioned earlier, it could
probably cover this. My current implementation muddles these two up,
which is why it's delivering things out of order, so my question is
probably muddled up too. I'll need to think about this a bit more.

>> > Example:
>>
>> That example is straight-forward, but there are more complex cases
>> that are unclear to me. Here's one I'm not sure about:
>>
>> There are two vats, Client and Server, each of which starts with a
>> reference to the other's bootstrap service. All calls either return a
>> single capability (field-name `x`) or Void.
>>
>> 1. Client makes a call, q1, on the server's bootstrap object, getting
>> a promise a=q1.x
>> 2. Client makes another call, q2, on the same target, getting promise
>> b=q2.x.
>> 3. Server asks one question, q3 (c=q3.x)
>> 4. Client responds to q3 with a (the unresolved promised cap from its q1)
>> 5. Server responds to q1 with client_bs (the client's bootstrap
>> service, resolved)
>> 6. Server responds to q2 with c (q3.x, still unresolved)
>> 7. Client makes call m1 on b (sent to q2)
>> 8. Client receives response a=client_bs (no embargo needed)
>> 9. Client receives response b=q3.x, which is a.
>>     This was q1.x at the time q3 returned, but client_bs now. Which
>> should it use?
>>     If client_bs, it embargoes the target due to m1.
>>     If not, b now points at the returned q1, which seems odd.
>>
>> 10. Client makes call m2 on b (which is then held at the embargo).
>> 11. Server receives response that c=q1.x (which is client_bs).
>> 12. Server receives m1 and forwards it to q3.x.
>> 13. Server sends disembargo response back to client.
>> 14. Client receives m1 and forwards it to q1 (the resolution it gave for
>> q3).
>> 15. Client disembargoes b and sends m2 to client_bs.
>
>
> Nice example!
>
> It looks like the C++ implementation today will decide b = q1.x, and never
> allow it to further resolve to client_bs. This "works" but is clearly
> suboptimal.
>
> For a correct solution, we need to recognize that Disembargo messages can
> "bounce" multiple times:

Does it alternate between being a disembargo request and a disembargo
response as this happens?
Does the 3-vat case complicate things?

> The disembargo sent in step 9 has a final destination of client_bs.
>
> In order to get there, it has to bounce back and forth between the client
> and server twice:
> * The client sends it towards q2.x.
> * The server, recognizing that it resolved q2.x to q3.x, reflects the
> embargo towards q3.x.
> * The client, recognizing that it resolved q3.x to q1.x, reflects back to
> the server again.
> * The server, recognizing that q1.x resolved to client_bs, finally reflects
> back to client_bs.
>
> This gives m1 enough time to arrive before the disembargo.
>
> It looks like this is not implemented correctly in C++ currently. It appears
> the C++ implementation ignores disembargo.messageTarget in the case that the
> Disembargo has type `receiverLoopback`. This is incorrect -- it needs to
> verify that the embargo has reached its final destination, not an
> intermediate promise. (However, it is "saved" by the suboptimal behavior
> mentioned above.)

OK, I'll try to match the C++ behaviour for now.

> On another note, you say you found this with AFL, which is amazing. Could
> your fuzzing strategy be applied to the C++ implementation as well?

Maybe. Here's how it works:

To simplify things, my OCaml capnp-rpc library is in two parts. One
provides the RPC logic over abstract message types, and the other
provides an implementation using the Cap'n Proto serialisation for the
messages. Most of the unit-tests check the core logic directly, using
a simpler message type where a payload is just a test string and an
array of capability pointers. The fuzz tests use a mutable struct with
things useful for checking for violations.

The fuzz tests set up some vats (two or three) in a single process and
then have them perform operations based on input from the fuzzer.
Each step selects one vat and performs a random (fuzzer-chosen)
operation out of:

1. Request a bootstrap capability from a random peer.
2. Handle one message on the incoming queue.
3. Call a random capability, passing randomly-selected capabilities as
arguments.
4. Finish a random question.
5. Release a random capability.
6. Add a capability to a newly-created local service.
7. Answer a random question, passing random-selected capabilities as
the response.

When it runs out of input data from the fuzzer it releases all
capabilities, answers all questions and allows the system to become
idle.

The fuzz tests include in the call's payload contents a sequence
number and a (mutable) struct containing the source reference's
counters:

type cap_ref_counters = {
  mutable next_to_send : int;
  mutable next_expected : int;
}

When the message arrives, the target service checks that the counter
in the content matches the current value of `next_expected` and
increments it.
So, it should always detect if messages arrive out of order.

Another way it takes advantage of everything running in one process is
that it maintains a second reference graph, but one which doesn't use
CapTP. When it requests a bootstrap capability over CapTP, it also
returns a direct pointer to the target service. So, it's a copy of the
reference graph but with all vat-spanning links replaced with direct
pointers. Then, it checks that every message is delivered to the
service it would have been delivered to if there were no network in
the way.

I leave AFL running my binary for a while with the --fuzz option
(which disables logging to keep things fast).
When it finds a violation it leaves it in the crash directory. Then I
run the fuzz binary on it manually without --fuzz, which turns on
logging and runs a load of sanity checks at each step, as well as
dumping the state of the system at each step. It also outputs an OCaml
unit-test, which can be cut-and-pasted into the test-suite. The
unit-tests look like this, after being cleaned up a bit:

https://github.com/mirage/capnp-rpc/blob/f5a32455c41056eaa40b3074f1cfb33741854e69/test/test.ml#L462

If the test-case is too long, afl-tmin can often shorten it.

There are plenty of other things it could be made to check, e.g. that
forked references can't access anything before their parent, that
messages to valid targets are always eventually delivered, that after
letting the system become idle all references point directly to the
vat containing their target in the direct reference graph, that
malicious vats can't cause protocol violations in connections between
good vats, etc. However, I have enough bugs to fix with the current
tests ;-)

I don't know if that's any use to you - I'm not sure how the C++ code
is structured.


-- 
talex5 (GitHub/Twitter)        http://roscidus.com/blog/
GPG: 5DD5 8D70 899C 454A 966D  6A51 7513 3C8F 94F6 E0CC

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Re: [capnproto] Some questions about the RPC spec

Reply via email to