Re: [capnproto] Re: Bikeshedding the capnp format

ashpilkin Wed, 28 Feb 2018 13:09:12 -0800

On Tue, 2018-02-27 at 10:36 -0800, Kenton Varda wrote:
> Fun stuff.

Thanks, glad you liked it :)


> 1) Eliminate the concept of segments from the encoding. Segments need
> only exist at write time, to allow for progressive allocation of
> additional message space. But, tracking this could be handled
> entirely in the MessageBuilder implementation. All the segments could
> be concatenated at write time, and the receiver could then treat the
> message as one large segment. Inter-segment pointers no longer need
> to be far pointers; as long as the distance is under 4GB, a regular
> relative pointer is fine. (Builder implementations would recognize
> cross-segment pointers by the fact that they fail the bounds check.)

Hm.  I don't see how you could do this without a full-on O(n)
compaction pass on transmission/write-out.  (Though you could argue
that writing out the message is O(n) in the first place, so you don't
use anything.)  Even with one, I'm not sure I know how to do it (and
still allow the message to grow arbitrarily in several places).

As far as I understood, what you're describing is like allocating a 4G
byte buffer for the message and reserving some space inside for each
place where the message can grow.  Only you don't store the unused
bytes in the buffer and instead use a sort of virtual addressing to map
 it to a collection of chunks (the "segments"); you don't transmit
them, either, and instead compact the final message on transmission. 
The problem is, if the reserved space runs out, you're stuck, short of
moving the rest of the message out of the way.

> 2) All pointers -- including far pointers, if they exist -- should be
> relative to the pointer location, never absolute. (Currently, far
> pointers contain an absolute segment index.) Making all pointers
> relative means that it's always possible to embed an existing message
> into a larger message without doing a tree copy, which turns out to
> be a pretty nifty thing to be able to do.

Oh, I didn't think of that.  Yes, that probably makes relative pointers
worth keeping.  You still need to get the message data into the buffer,
and that's still O(n) [though a much faster O(n) than a tree traversal]
unless you use something like scatter/gather, which is not exactly a
standard feature of any of the current capnp libraries (or popular
operating systems for that matter).

> 3) Recognize that there's no fundamental need to distinguish on the
> wire whether a pointer points to a struct or a list. All we really
> need to know is the object location, size, and which bits are data
> vs. pointers.

I thought about that too when writing the original email (and mentioned
it in a parenthetical remark), but I wasn't quite sure not
distinguishing between an object and a list of pointers was safe wrt
schema evolution.

> The "common" pointer encoding could be:
> [...]
> 16 bits: element count
> 8 bits: data words per element (with special reserved values for 0-bit, 
> 1-bit, 8-bit, 16-bit, 32-bit)
> 8 bits: pointers per element

I know I ramble a lot, but if you can stand it, my original email
describes why you don't actually need a pointer count, provided you can
spare a bit in each pointer and a lookup table with 32 bits per struct
field in the decoder for every struct you'll want to read from.  (Well,
OK, you need a "has pointers?" flag, but that's it.)

Also, if you store a byte count instead of an element count, you don't
need to distinguish between arrays of 8-, 16-, 32- and 64-bit scalars:
if you're reading, you can look it up in the schema.  And if you're
only copying, you don't care what the element size is, only how many
bytes the whole thing takes.

The encoding spec mentions you thought about storing nested arrays---
did this turn out to be unnecessary?

> [...] When the first bit is 0, this indicates an "uncommon pointer",
> which could be any of:
> [...]
> - Trampoline pointer: Like today's "double-far" pointer: points to a
> two-word object which contains tag information and a further pointer
> to the actual object content. Here we can have 2x16-bit segment
> sizes, 61-bit offset, and 35-bit element count, which would become
> the new upper limit on list sizes (compared to today's 29-bit limit).

Do you actually use >4G arrays or even messages in practice?  It seems
like they inevitably complicate the encoding for little benefit, though
I see how the FlatBuffer game devs would disagree.

Alex

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Re: [capnproto] Re: Bikeshedding the capnp format

Reply via email to