Re: [capnproto] Re: Bikeshedding the capnp format

'Kenton Varda' via Cap'n Proto Wed, 28 Feb 2018 18:27:08 -0800

On Wed, Feb 28, 2018 at 1:08 PM, <[email protected]> wrote:

> Hm.  I don't see how you could do this without a full-on O(n)
> compaction pass on transmission/write-out.  (Though you could argue
> that writing out the message is O(n) in the first place, so you don't
> use anything.)  Even with one, I'm not sure I know how to do it (and
> still allow the message to grow arbitrarily in several places).
>


I don't understand. What I described shouldn't change write logistics.

Think of it this way: We're saying that far pointers can instead be
represented as regular pointers that happen to have an out-of-bounds
offset. Previously, such pointers were simply invalid. Now, we interpret
them as pointing into the next segment.

That said, given that segment boundaries no longer matter, it might make
sense to replace the segment table with a single 64-bit size -- or in the
case of files (where the filesystem tracks the size), no prefix is needed
at all.

Oh, I didn't think of that.  Yes, that probably makes relative pointers
> worth keeping.  You still need to get the message data into the buffer,
> and that's still O(n) [though a much faster O(n) than a tree traversal]
> unless you use something like scatter/gather, which is not exactly a
> standard feature of any of the current capnp libraries (or popular
> operating systems for that matter).
>

Eh? kj::OutputStream and kj::AsyncOutputStream both support gather-writes
and Cap'n Proto uses it when serializing to a stream. This translates into
a writev() syscall.

Moreover, the C++ implementation of Cap'n Proto supports linking a large
external byte blob into a message without copying it. The blob becomes a
message segment.

I actively use this. Sandstorm.io's spk tool, which constructs a
capnp-encoded archive from a directory tree, actually mmap()s each file and
links it into the archive this way. It then writes out the whole archive at
once. All this happens without ever reading the bytes of any of the files
in userspace. In theory a really smart kernel and copy-on-write filesystem
has the information it needs to reflink the underlying bytes, though
admittedly that's unlikely to happen in practice.


> > 3) Recognize that there's no fundamental need to distinguish on the
> > wire whether a pointer points to a struct or a list. All we really
> > need to know is the object location, size, and which bits are data
> > vs. pointers.
>
> I thought about that too when writing the original email (and mentioned
> it in a parenthetical remark), but I wasn't quite sure not
> distinguishing between an object and a list of pointers was safe wrt
> schema evolution.
>

I think it only creates new compatibilities, not incompatibilities.
Specifically a struct field would be compatible with a list-of-struct field
so long as the list contained only one element. I'd probably take it a step
further and say that replacing a struct with a list-of-structs is a
supported upgrade, and that old programs will always use the first element
of the list. This is actually a fairly change to want to make in real-world
protocols.


> > The "common" pointer encoding could be:
> > [...]
> > 16 bits: element count
> > 8 bits: data words per element (with special reserved values for 0-bit,
> 1-bit, 8-bit, 16-bit, 32-bit)
> > 8 bits: pointers per element
>
> I know I ramble a lot, but if you can stand it, my original email
> describes why you don't actually need a pointer count, provided you can
> spare a bit in each pointer and a lookup table with 32 bits per struct
> field in the decoder for every struct you'll want to read from.  (Well,
> OK, you need a "has pointers?" flag, but that's it.)
>

I read that, but I don't like it. I'm uncomfortable with the idea that, to
determine the size of the sections, you either need a lookup table based on
the schema, or you need to do an O(n) scan over the pointers. I'll grant
you that you can make this work for the obvious use cases: copying is O(n)
anyway, and bounds-checking in accessor methods could be based on a check
against the total size rather than a check against the specific location
you're trying to read. But, the code to implement both these cases becomes
significantly more complicated.

I think other, obscure cases may be broken. For example, take the case
where you initialize a builder by deep-copying a message from a reader,
then you traverse that message in the builder. Say you follow some struct
pointer. How do you determine if the target object has a compatible layout?
It could be that the total size is a match, but the section sizes don't
match. You have two options:

1) Scan the pointers to verify there are the correct number. This would be
unacceptably slow -- repeatedly get()ing the same pointer should be fast,
but here we need to do an O(n) scan each time.
2) Just assume that if the struct has the right overall size, then it must
have been created with the same version of the schema and therefore has the
correct section sizes. In this case, though, you now have to account for
the possibility that you'll accidentally try to interpret a pointer as
data, or data as a pointer. You can thus accidentally leak data (by
overwriting a pointer, thus losing the connection to whatever it pointed
at) or encounter invalid pointers (something that's actually not possible
in builders currently).

So no, I don't think eliminating the separate section sizes is feasible.
But greatly reducing their range in the common case makes a lot of sense.
In fact, 4-bit counts would probably be fine in 99% of cases, so the
section sizes could be packed into a single byte if that were necessary.


> The encoding spec mentions you thought about storing nested arrays---
> did this turn out to be unnecessary?
>

I don't recall ever having found a use for them.

> [...] When the first bit is 0, this indicates an "uncommon pointer",
> > which could be any of:
> > [...]
> > - Trampoline pointer: Like today's "double-far" pointer: points to a
> > two-word object which contains tag information and a further pointer
> > to the actual object content. Here we can have 2x16-bit segment
> > sizes, 61-bit offset, and 35-bit element count, which would become
> > the new upper limit on list sizes (compared to today's 29-bit limit).
>
> Do you actually use >4G arrays or even messages in practice?  It seems
> like they inevitably complicate the encoding for little benefit, though
> I see how the FlatBuffer game devs would disagree.
>

I have used them occasionally, such as for the aforementioned Sandstorm
archive format. I do like the idea of designing the format such that it can
handle as large of a message as you could realistically want.

It would be nice, though, if e.g. embedded use cases could assume no
message is that large, and shed the code to handle it. In my proposal, tag
pointers and trampoline pointers could be left out, simplifying code
considerably.

-Kenton

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Re: [capnproto] Re: Bikeshedding the capnp format

Reply via email to