[protobuf] Binary protocol optimization recomendations

Andrey Dotsenko Sun, 05 Nov 2017 11:29:09 -0800

Hi again!

I'm trying to write library for C with minimal memory allocations and I've 
encounted that protocol is not as perfect as I thought. My aim is embedded 
systems but with compatibility mode with Protocol Buffers v3. That would 
ease writing complex software using different languages while most of 
software is written in C.


So here my recomendations for 4-th version of protocol. I've simplified 
examples (names, etc) to make examples obvious. I understand that these 
buffers work on many servers and changing the API would break compatibility 
making more problems than advantages, but I think that systems need to be 
renewed in a time.

1. Wire type is not logical. It's more sutable for encoding type. Here's my 
vision of this field:

Encoding should be limited to vint, fixed or flexible. It would be sent by 
wires.

enum field_encoding_t {
    VINT8_UNSIGNED = 0,
    VINT8_SIGNED,
    VINT8_ZIGZAG,
    FIXED32,
    FIXED64,
    FLEXIBLE
};

But field type would be known to encoding and decoding libraries, so any 
program could correctly determine correct type of received data by its 
encoding and expected type.

enum field_type_t {
    U32,
    I32,
    U64,
    I64,
    FLOAT32,
    FLOAT64,
    ENUM32,
    STRING,
    BINARY,
    MESSAGE,
};

Example of encoding and decoding:

ssize_t encode_field_i32(iter_t *iter,
                                     int field_num,
                                     field_encoding_t field_encoding,
                                     int32_t value)
{
    void *ptr = iter->ptr;

    ssize_t encoded_size;
    encoded_size = encode_field_key(iter, field_num, field_encoding);
    if (encoded_size < 0) {
        return encoded_size;
    }

    switch (field_encoding) {
    case FIXED32:
        encoded_size = write(iter, &value, sizeof(value));
        break;
    case VINT_SIGNED:
        encoded_size = write_i32_as_vint8(iter, value);
        break;
    case VINT_ZIGZAG:
        encoded_size = write_i32_as_zzvint8(iter, value);
        break;
    default:
        errno = EINVAL;
        encoded_size = -1;
    }
    if (encoded_size < 0) {
        goto aborting;
    }

    encoded_size = iter->ptr - ptr;
    return encoded_size;

aborting:
    iter->ptr = ptr;
    return encoded_size;
}

ssize_t
decode_field_value_i64(iter_t *iter,
                                   field_encoding_t field_encoding,
                                   int64_t *value)
{
    switch (field_encoding) {
    case VINT_SIGNED:
        return read_vint8_as_i64(iter, value);
    case VINT_ZIGZAG:
        return read_zzvint8_as_i64(iter, value);
    case FIXED64:
        return read(iter, value, sizeof(*value));
    default:
        errno = EINVAL;
        return -1;
    }
}

As you see protocol becomes more flexible. Now I can change encoding from 
encoding node and decoding node will automatically correctly decode changed 
data. Moreover it makes possible to change encoding over time. For example 
if integer values sent over time comes to negative side, than we can change 
encoding alghoritm on encoding node without any need to recompile (nor 
restart) decoding node program. In case of servers with long uptime it 
would be useful as I think.

2. Why do we need a submessage size?

Before I started working on submessages I could use single buffer making 
realloc to it with huge reserve from time to time as it was made in 
asprintf realization. So in many cases only one allocation of buffer would 
be needed.  But submessages with varint preceding them break the sheme. I 
should compute size of submessage (which is inefficient) or allocate 
another buffer and in my case making memcpy afterwards.

In cases of strings, binary data or arrays I always know size of data. So I 
can write it before writing data. But in case of submessages I need to 
compute it.

I've analized protocol and didn't find any need for the message length.  
Why did you included it? I still could decode submessage because I know 
size of all it's members. And I didn't find any use case for this size in 
3rd version of Protocol Buffers.


Thanks for your work! For now I send my data with raw packets by sockets, 
but protocol buffers would allow me to use Go language with C and is great! 
And if Go would allow memory pressure alghoritms someday it would be the 
best low-level language, but this wish is not for this topic as I think. :)

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

[protobuf] Binary protocol optimization recomendations

Reply via email to