Re: [capnproto] Cap'n Proto for Elm

Prasanth Somasundar Fri, 31 May 2019 11:10:50 -0700

>    Actual line of code from Cloudflare Workers:
>    case PipelineDef::Stage::Worker::Global::Value::JSON:
>    So again, I'm not sure this is a problem specific to certain languages.
>    :)


Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

    websession.WebSession_WebSocketStream_sendBytes_Results_Future
That's pretty bad in both cases. However, in C++, you get the benefit of using 
declarations on namespaces and classes, so if you wished, you could do 
something like:


using Global = Pipeline::Stage::Worker::Global;

case Global::Value::JSON:

I can say from experience writing quite a bit of C++ code that consumed 
protobufs that this was the only thing that made it look sane. Unfortunately, 
if we use function name based namespacing (if you want to call it that), we 
cannot get this benefit easily as you'd have to rename all the functions 
consistently and on an ad-hoc basis.

With Elm and Haskell, I believe the only layer of namespacing that's provided 
is the module name which is why both Ian and I gravitated to it initially.

Upon further reflection, I think this might even be important enough that I toy 
with single-module-per-struct for a prototype.

Arrays are definitely workable for prototyping, but if you switch over
to a flat buffer representation in the future your current design for
updates will start being O(length of segment) instead of O(log(length of
segment)), so that's something to keep in mind.
Fair enough.
One possible design, which I think I'd do for the Haskell implementation
if I were to start from scratch: don't support in-place modifications.
Have encode and decode. This way your read support can do the obvious
thing with buffers, and your write support can be an 'Encoder' type
which is a wrapper around Bytes.Encoder + some metadata about addresses.
I was and still am not really sure how to handle writes properly. It doesn't 
feel super good. I don't think that Elm has any support for mutable state at 
the moment and I'm not sure what that would look like. The good news for Elm at 
least is that it runs on an event loop and you'd be able to atomize all changes 
using a good API surrounding `Cmd` if you're ok with async writes. The problem 
currently, is that you have one of three options, and none of them are good:

  1.  Have capnp live in Javascript land for a good write API, but terrible 
read API as your reads become async in Elm.
  2.  Have capnp live in Elm for a good read API, but terrible write by 
performance, complexity, or API.
  3.  Have capnp live in both Elm and Javascript and deal with synchronization 
and data duplication.

I guess I'll think further about what I want writes to look like and publish to 
the Elm discourse. Just for clarity, I did post a quick 
thread<https://discourse.elm-lang.org/t/potential-scenario-for-packed-array-of-integers-capnproto/3671/2>
 mentioning that I was looking into this, but there was less traction initially 
- probably for good reason - but I'll post the design there once I'm satisfied 
with the API. If Evan doesn't respond, then I'll try to prod some other way.
>    Still, it's not clear to me why you'd use Cap'n Proto if you're going
>    to do a full serialization/deserialization. Just use Protobufs at that
>    point. You could argue that this existing for completeness is valuable
>    i.e. you can run capnp on your backend and not be forced to translate
>    into a protobuf on your frontend, but at that point. I'm not sure that
>    this is a good enough reason to write a library like this.

Fair enough. I'd say for RPC (which you've said you're not shooting for,
so maybe moot in this case) or if you've got some existing system you
want to talk to; it would be neat to have this for writing sandstorm
apps.
To be clear, this initial design was just something to get started. I'd like to 
implement RPC down the line, it's just that I don't want to start by thinking 
about it or making it the goal.
The salient difference in behavior here is that the traversal limit is
part of the struct, rather than the message, so if you have a branching
structure (like a tree), it can't really protect you, since it isn't
shared across branches.

I don't see an ergonomic way to address this.
That's interesting. So it's clearly different behavior from what the other 
libraries are using, but unless I'm missing something (which is quite 
possible), I believe it should still guard against an attack. If the case we're 
guarding against is a cyclic pointer and therefore infinite recursion, and the 
client library has no means of creating a custom struct, then the counter is 
still going to guard against a poorly encoded/adversarial message.

That said, this can still be bad. In the worst case, as you mentioned a tree, 
where the client has decided to traverse all branches of the message, the depth 
is going to be bounded by the limit, but the overall message could be traversed 
for 2^limit data. This can be mitigated by lowering the threshold from 64Mib to 
32Mib or even 16Mib. It might even make sense to have separate traversal and 
message size limits here to allow larger messages with limited traversal depth.

If someone with a better security background could vet that, I'd appreciate it.
Proto3's motivation for removing defaults, as I understand it, is that the 
designers of Go very much wanted for Protobufs to be represented as raw 
structs. Go does not have a concept of constructors for raw structs; they are 
simply zero-initialized. Instead of improving their language, they asked for 
changes to Protobuf.
BWAHAHAHA 🤦‍♂️
Re: "mmap()" in elm in another of your messages, this is why I suggested that 
Mezuzza get in touch with Evan early
Didn't realize my name wasn't appearing on google groups. Seems that I actually 
had this account in Google since 2010 and forgot about it. We can just use my 
real name.

--Prasanth
________________________________
From: Ian Denhardt <[email protected]>
Sent: Thursday, May 30, 2019 6:36 PM
To: 'Kenton Varda' via Cap'n Proto; Kenton Varda
Cc: David Renshaw; prasanth somasundar; capnproto
Subject: Re: [capnproto] Cap'n Proto for Elm

Quoting 'Kenton Varda' via Cap'n Proto (2019-05-30 14:22:59)

>    Actual line of code from Cloudflare Workers:
>    case PipelineDef::Stage::Worker::Global::Value::JSON:
>    So again, I'm not sure this is a problem specific to certain languages.
>    :)

Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

    websession.WebSession_WebSocketStream_sendBytes_Results_Future

>    Hmm. It's unfortunate that this means that if someone adds a non-union
>    field to a struct that previously contained only a union, Haskell code
>    using the protocol will break and probably need major rewrites.

I don't know about "major rewrites" -- the changes would be fairly
mechanical, and not outside of the realm of something you could write a
tool to mostly automate. You basically just end up having to wrap unwrap
and unwrap the one extra layer everywhere. It would definitely not be
backwards compatible at the source level, but source compatibility
doesn't work super well with capnproto in general.

But I think there's a similar ergonomics vs. extensibility trade-off
with the current situation, where it's not possible to turn a struct
into a union after the fact (See Yaron Minsky's recent mail). You could
tweak things so that there's always a tag field, and client code always
has to switch on it even if the schema doesn't define any other
variants. So you don't have to worry about the consequences of adding a
variant where there wasn't one, but you have to actually match on every
datatype, vs. not needing to worry about adding common fields but having
an extra .union_ or such to every use of a sum type.

I think this is basically the expression problem at work.

>    But I do see why you'd want to use the language's built-in variant
>    types if at all possible.

>    FWIW I feel this is a fundamental flaw in variant types as seen in most
>    functional languages: if you ever discover there's some field that is
>    needed by *all* the variants, you can't add it without completely
>    changing the type and updating every use site.

Yeah, this is always the tension with things like IDLs; it would be nice
if most programming languages had sums and products that distributed
over one another, but given that they don't, you have to contend with
the fact that if you add features like that to the schema language, they
won't map well (see also Go & default values). I tend to favor
optimizing for the call site, which in this case means the programming
language, rather than the schema language.

There are of course exceptions though; having unions is still a big boon
even in languages like Go that don't have proper sum types at all.

>    My main motivation was just that it feels inconsistent to allow
>    defaults only for primitives but not for pointers. And default values
>    for primitives are used all the time.
>    But perhaps I should have been more practical here.

There already are enough inconsistencies between how pointer vs.
non-pointer types behave that you have to keep the difference in mind;
given that I don't think you gain much from this little bit of
extra consistency.

Re: "mmap()" in elm in another of your messages, this is why I suggested
that Mezuzza get in touch with Evan early; there's no reason why the
bytes package couldn't provide this functionality, but it currently
doesn't, and the javascript FFI isn't usable for this sort of thing, so
whether or not Evan seems inclined to add the needed support is really
important to where you end up taking the design.

Also, I'd actually love to see (and in the passed toyed with the idea of
writing) an elm generator that spits out idiomatic data types and
encoders & decoders built on Elm's json package; this would be great for
talking to sandstorm's postMessage API. In that case you're not even
touching the capnp wire format.

-Ian

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/BYAPR11MB25995FF666E3F50818F69DC1C5190%40BYAPR11MB2599.namprd11.prod.outlook.com.

Re: [capnproto] Cap'n Proto for Elm

Reply via email to