Re: [protobuf] Binary protocol optimization recomendations

David Yu Mon, 06 Nov 2017 02:20:37 -0800

On Mon, Nov 6, 2017 at 3:28 AM, Andrey Dotsenko <[email protected]>
wrote:


> Hi again!
>
> I'm trying to write library for C with minimal memory allocations and I've
> encounted that protocol is not as perfect as I thought. My aim is embedded
> systems but with compatibility mode with Protocol Buffers v3. That would
> ease writing complex software using different languages while most of
> software is written in C.
>
> So here my recomendations for 4-th version of protocol. I've simplified
> examples (names, etc) to make examples obvious. I understand that these
> buffers work on many servers and changing the API would break compatibility
> making more problems than advantages, but I think that systems need to be
> renewed in a time.
>
> 1. Wire type is not logical. It's more sutable for encoding type. Here's
> my vision of this field:
>
> Encoding should be limited to vint, fixed or flexible. It would be sent by
> wires.
>
> enum field_encoding_t {
>     VINT8_UNSIGNED = 0,
>     VINT8_SIGNED,
>     VINT8_ZIGZAG,
>     FIXED32,
>     FIXED64,
>     FLEXIBLE
> };
>
> But field type would be known to encoding and decoding libraries, so any
> program could correctly determine correct type of received data by its
> encoding and expected type.
>
> enum field_type_t {
>     U32,
>     I32,
>     U64,
>     I64,
>     FLOAT32,
>     FLOAT64,
>     ENUM32,
>     STRING,
>     BINARY,
>     MESSAGE,
> };
>
> Example of encoding and decoding:
>
> ssize_t encode_field_i32(iter_t *iter,
>                                      int field_num,
>                                      field_encoding_t field_encoding,
>                                      int32_t value)
> {
>     void *ptr = iter->ptr;
>
>     ssize_t encoded_size;
>     encoded_size = encode_field_key(iter, field_num, field_encoding);
>     if (encoded_size < 0) {
>         return encoded_size;
>     }
>
>     switch (field_encoding) {
>     case FIXED32:
>         encoded_size = write(iter, &value, sizeof(value));
>         break;
>     case VINT_SIGNED:
>         encoded_size = write_i32_as_vint8(iter, value);
>         break;
>     case VINT_ZIGZAG:
>         encoded_size = write_i32_as_zzvint8(iter, value);
>         break;
>     default:
>         errno = EINVAL;
>         encoded_size = -1;
>     }
>     if (encoded_size < 0) {
>         goto aborting;
>     }
>
>     encoded_size = iter->ptr - ptr;
>     return encoded_size;
>
> aborting:
>     iter->ptr = ptr;
>     return encoded_size;
> }
>
> ssize_t
> decode_field_value_i64(iter_t *iter,
>                                    field_encoding_t field_encoding,
>                                    int64_t *value)
> {
>     switch (field_encoding) {
>     case VINT_SIGNED:
>         return read_vint8_as_i64(iter, value);
>     case VINT_ZIGZAG:
>         return read_zzvint8_as_i64(iter, value);
>     case FIXED64:
>         return read(iter, value, sizeof(*value));
>     default:
>         errno = EINVAL;
>         return -1;
>     }
> }
>
> As you see protocol becomes more flexible. Now I can change encoding from
> encoding node and decoding node will automatically correctly decode changed
> data. Moreover it makes possible to change encoding over time. For example
> if integer values sent over time comes to negative side, than we can change
> encoding alghoritm on encoding node without any need to recompile (nor
> restart) decoding node program. In case of servers with long uptime it
> would be useful as I think.
>
> 2. Why do we need a submessage size?
>
> Before I started working on submessages I could use single buffer making
> realloc to it with huge reserve from time to time as it was made in
> asprintf realization. So in many cases only one allocation of buffer would
> be needed.  But submessages with varint preceding them break the sheme. I
> should compute size of submessage (which is inefficient) or allocate
> another buffer and in my case making memcpy afterwards.
>
> In cases of strings, binary data or arrays I always know size of data. So
> I can write it before writing data. But in case of submessages I need to
> compute it.
>
Right.

>
> I've analized protocol and didn't find any need for the message length.
> Why did you included it?
>
In the early versions of protobuf, they had a START_GROUP/END_GROUP
encoding to encode submessages kind of like json's '{' and '}'.
When they released protobuf v2, they deprecated it in favor of what we have
now, which is needing to compute the size of submessages.
I believe it was needed by their network/rpc infrastructure as it relies on
peeking the contents of a buffer and then deciding where to dispatch it.
Having the size of the submessage available on deserialization means they
could skip them efficiently.

When all you need to peek are numbers, you can have O(1) access to your
data by simply arranging your proto definition where all float, double,
fixed32 and fixed64 are declared last.
That approach does not need using protobuf's deserialization lib since you
can statically compute the offset of the numbers (preferably by a compiler)
and access the buffer directly.

If you need 0(1) access to all fields (not just numbers), you might be
interested in flatbuffers <https://github.com/google/flatbuffers>.
In my c++ projects, I use it over protobuf and it works great there.

> I still could decode submessage because I know size of all it's members.
> And I didn't find any use case for this size in 3rd version of Protocol
> Buffers.
>
>
> Thanks for your work! For now I send my data with raw packets by sockets,
> but protocol buffers would allow me to use Go language with C and is great!
> And if Go would allow memory pressure alghoritms someday it would be the
> best low-level language, but this wish is not for this topic as I think. :)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
When the cat is away, the mouse is alone.
dyuproject.com

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] Binary protocol optimization recomendations

Reply via email to