Re: [DISCUSS]: KIP-625: Richer encodings for integral-typed protocol fields.

Colin McCabe Fri, 19 Jun 2020 13:18:15 -0700

On Tue, Jun 16, 2020, at 07:42, Tom Bentley wrote:
> Hi Colin,
> 
> Thanks for taking a look at it. Replies inline...
> 
> On Mon, Jun 15, 2020 at 6:22 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi Tom,
> >
> > It's an interesting idea.  Obviously protocol buffers does this for all
> > numeric fields.
> >
> > I have to admit I have some mixed feelings, since this is another thing
> > that makes encoding more complex.  And it's not a clear win in all cases,
> > although it is in some.
> >
> 
> Yeah, both these points are valid. Obviously I chose MetadataResponse
> because the benefits are more apparent, but there are plenty of RPCs where
> it wouldn't make so much difference. I provided the benchmark numbers to
> try to make it a little easier, but of course they only illuminate the
> MetadataResponse case. It wouldn't be so very much effort to generate
> benchmarks for each RPC (driven off a JSON representation of a "typical"
> message) which would make it easier to judge on a case-by-case basis (both
> this and other protocol changes).
> 
> I assume that the performance numbers here are for the old Struct-based
> > encoding / decoding system.  Do we know what the numbers are when using the
> > generated read and write functions?
> >
> 
> I've added them to the proposal.
> 
> 
> > I don't think it makes sense for type to be version-dependent.  Type
> > ultimately translates into what Java type we should use to represent the
> > field when it's in POJO form.  We can't have two types there.
> >
> 
> I've removed that section from the proposal.
> 
> 
> > Making encoding version-dependent is reasonable.  I do sort of wonder if
> > we should just have something like "packedVersions" : "9+" rather than the
> > "encoding" thing, though.  The latter is more conceptually elegant but it
> > seems like it would be a pain to use.  For example, for a new field you
> > would have to type "encoding": { "0+" : "packed" } which is kind of ugly.
> >
> 
> Yeah, I wondered about having this on the message level.
>


Hi Tom,

To be clear, I wasn't proposing having it on the message level, but on the 
level of individual fields, similar to nullableVersions.  Although there may be 
some advantages to making it a per-message field, like flexibleVersions....

>
> Having a message-level "packedVersions" wouldn't help with evolving types
> from 32 to 64 bits. This is something I briefly alluded to in the initial
> version of the KIP, but I've now elaborated on it. TBH I'm still not sure
> how useful that would be (the review process seems to do a good job of
> considering field sizes, so perhaps there are no fields where, in
> hindsight, using a wider field would have been beneficial). So I've added
> this for now, and am including the number of bits in the encoding name, but
> I can easily drop this if it's irrelevant.
> 
> To alleviate the burden of using encoding for new fields I've allowed
> "encoding" to support string values as well as JSON objects, e.g.
> "encoding": "packed", as a shortcut when the same encoding was used for all
> versions.
> 

I have mixed feelings about this, since it makes the JSON parsing a bit more 
complex.  I guess it's probably worth it, given the gain in expressiveness.  
It's not hard to support from Jackson but consuming the JSON from other places 
might get harder.

>
> In any case, I'd be happy to implement support for this via a message-level
> "packedVersions" is that's what the consensus of this discussion is.
> 
> 
> > Another thing here is that if we are going to go all this way for
> > optimization, we should certainly give people the choice of whether to use
> > zigzag encoding or not.  For fields that can never be negative, zigzag
> > encoding is a waste.  So you would then have the option of unpacked, signed
> > packed, and unsigned packed.
> >
> 
> That is what's proposed in the KIP, though maybe I explained it poorly.
> 
> 
> > Finally, the Kafka protocol has a lot of fields which can never be
> > negative, except for some special cases where they are -1.  But no other
> > negative numbers are allowed.  So we should consider making the "unsigned
> > packed" option actually encode num + 1 just to support this usage.  That's
> > what we did for string and bytes prefix length encoding and it worked well.
> >
> 
> I considered this for error codes, but there the -1 case
> (UNKNOWN_SERVER_ERROR) would be rare enough that it didn't feel like it was
> worth the effort. But I overlooked other cases where -1 is used in a
> similar way. I think the hardest thing here would be to come up with a
> sensible name for this encoding, since it's not plain unsigned packed. I
> suppose the name could include the offset, i.e. upacked32+1 for example.
> 

Hmm.  I think it would be reasonable to just have unsigned packed always work 
this way.  There's not any major disadvantage.

> 
> > Before we do all this, though, one simpler improvement would be making all
> > the "error" fields into tagged fields.  Most of them remain at 0 most of
> > the time, so this could very well provide a big savings without any big
> > encoding changes.
> >
> 
> Using a tag for error codes and messages is a good idea.
> 

I guess I was suggesting doing this first, since it's easy? :)

best,
Colin

>
> Kind regards,
> 
> Tom
>

Re: [DISCUSS]: KIP-625: Richer encodings for integral-typed protocol fields.

Reply via email to