Re: [DISCUSS] KIP-482: The Kafka Protocol should Support Optional Fields

Colin McCabe Mon, 19 Aug 2019 10:01:29 -0700

Hi Satish,

I wasn't originally going to propose supporting the struct type, although 
perhaps we could consider it.


In general, supporting a struct containing an array has the same serialization 
issues as just supporting the array.

Probably, we should just have a two-pass serialization mechanism where we 
calculate lengths first and then write out bytes.  If we do that, we can avoid 
having any restrictions on tagged fields vs. regular fields.  I will take a 
look at how complex this would be.

best,
Colin


On Sun, Aug 18, 2019, at 22:27, Satish Duggana wrote:
> Please read struct type as a complex record type in my earlier mail.
> The complex type seems to be defined as Schema[1] in the protocol
> types.
> 
> 1. 
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java#L27
> 
> 
> On Mon, Aug 19, 2019 at 9:46 AM Satish Duggana <[email protected]> 
> wrote:
> >
> > Sorry! Colin, I may not have been clear in my earlier query about
> > optional field type restriction. It is mentioned in one of your
> > replies "optional fields are serialized starting with their total
> > length". This brings the question of whether optional fields support
> > struct types (with or without array values). It seems struct types are
> > currently not serialized with total length. I may be missing something
> > here.
> >
> > Thanks,
> > Satish.
> >
> >
> > On Wed, Aug 14, 2019 at 8:03 AM Satish Duggana <[email protected]> 
> > wrote:
> > >
> > > Hi Colin,
> > > Thanks for the KIP. Optional fields and var length encoding support is a 
> > > great
> > > improvement for the protocol.
> > >
> > > >>Optional fields can have any type, except that they cannot be arrays.
> > > Note that the restriction against having tagged arrays is just to simplify
> > > serialization.  We can relax this restriction in the future without 
> > > changing
> > > the protocol on the wire.
> > >
> > > Can an Optional field have a struct type which internally contains an 
> > > array
> > > field at any level?
> > >
> > > Thanks,
> > > Satish.
> > >
> > >
> > >
> > > On Tue, Aug 13, 2019 at 11:49 PM David Jacot <[email protected]> wrote:
> > > >
> > > > Hi Colin,
> > > >
> > > > Thank you for the KIP! Things are well explained!. It is huge 
> > > > improvement
> > > > for the Kafka protocol. I have few comments on the proposal:
> > > >
> > > > 1. The interleaved tag/length header sounds like a great optimisation 
> > > > as it
> > > > would be shorter on average. The downside, as
> > > > you already pointed out, is that it makes the decoding and the specs 
> > > > more
> > > > complex. Personally, I would also favour using two
> > > > vaints in this particular case to keep things simple.
> > > >
> > > > 2. As discussed, I wonder if it would make sense to extend to KIP to 
> > > > also
> > > > support optional fields in the Record Header. I think
> > > > that it could be interesting to have such capability for common fields
> > > > across all the requests or responses (e.g. tracing id).
> > > >
> > > > Regards,
> > > > David
> > > >
> > > >
> > > >
> > > > On Tue, Aug 13, 2019 at 10:00 AM Jason Gustafson <[email protected]> 
> > > > wrote:
> > > >
> > > > > > Right, I was planning on doing exactly that for all the 
> > > > > > auto-generated
> > > > > RPCs. For the manual RPCs, it would be a lot of work. It’s probably a
> > > > > better use of time to convert the manual ones to auto gen first (with 
> > > > > the
> > > > > possible exception of Fetch/Produce, where the ROI may be higher for 
> > > > > the
> > > > > manual work)
> > > > >
> > > > > Yeah, that makes sense. Maybe we can include the version bump for all 
> > > > > RPCs
> > > > > in this KIP, but we can implement it lazily as the protocols are 
> > > > > converted.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Aug 12, 2019 at 7:16 PM Colin McCabe <[email protected]> 
> > > > > wrote:
> > > > >
> > > > > > On Mon, Aug 12, 2019, at 11:22, Jason Gustafson wrote:
> > > > > > > Hi Colin,
> > > > > > >
> > > > > > > Thanks for the KIP! This is a significant improvement. One of my
> > > > > personal
> > > > > > > interests in this proposal is solving the compatibility problems 
> > > > > > > we
> > > > > have
> > > > > > > with the internal schemas used to define consumer offsets and
> > > > > transaction
> > > > > > > metadata. Currently we have to guard schema bumps with the 
> > > > > > > inter-broker
> > > > > > > protocol format. Once the format is bumped, there is no way to
> > > > > downgrade.
> > > > > > > By fixing this, we can potentially begin using the new schemas 
> > > > > > > before
> > > > > the
> > > > > > > IBP is bumped while still allowing downgrade.
> > > > > > >
> > > > > > > There are a surprising number of other situations we have 
> > > > > > > encountered
> > > > > > this
> > > > > > > sort of problem. We have hacked around it in special cases by 
> > > > > > > allowing
> > > > > > > nullable fields to the end of the schema, but this is not really 
> > > > > > > an
> > > > > > > extensible approach. I'm looking forward to having a better 
> > > > > > > option.
> > > > > >
> > > > > > Yeah, this problem keeps coming up.
> > > > > >
> > > > > > >
> > > > > > > With that said, I have a couple questions on the proposal:
> > > > > > >
> > > > > > > 1. For each request API, we need one version bump to begin 
> > > > > > > support for
> > > > > > > "flexible versions." Until then, we won't have the option of using
> > > > > tagged
> > > > > > > fields even if the broker knows how to handle them. Does it make 
> > > > > > > sense
> > > > > to
> > > > > > > go ahead and do a universal bump of each request API now so that 
> > > > > > > we'll
> > > > > > have
> > > > > > > this option going forward?
> > > > > >
> > > > > > Right, I was planning on doing exactly that for all the 
> > > > > > auto-generated
> > > > > > RPCs. For the manual RPCs, it would be a lot of work. It’s probably 
> > > > > > a
> > > > > > better use of time to convert the manual ones to auto gen first 
> > > > > > (with the
> > > > > > possible exception of Fetch/Produce, where the ROI may be higher 
> > > > > > for the
> > > > > > manual work)
> > > > > >
> > > > > > > 2. The alternating length/tag header encoding lets us save a byte 
> > > > > > > in
> > > > > the
> > > > > > > common case. The downside is that it's a bit more complex to 
> > > > > > > specify.
> > > > > It
> > > > > > > also has some extra cost if the length exceeds the tag 
> > > > > > > substantially.
> > > > > > > Basically we'd have to pad the tag, right? I think I'm wondering 
> > > > > > > if we
> > > > > > > should just bite the bullet and use two varints instead.
> > > > > >
> > > > > > That’s a fair point. It would be shorter on average, but worse for 
> > > > > > some
> > > > > > exceptional cases. Also, the decoding would be more complex, which 
> > > > > > might
> > > > > be
> > > > > > a good reason to go for just having two varints. Yeah, let’s 
> > > > > > simplify.
> > > > > >
> > > > > > Regards,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 9, 2019 at 4:31 PM Colin McCabe <[email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've made some updates to this KIP. Specifically, I wanted to 
> > > > > > > > avoid
> > > > > > > > including escape bytes in the serialization format, since it 
> > > > > > > > was too
> > > > > > > > complex. Also, I think this is a good opportunity to slim down 
> > > > > > > > our
> > > > > > > > variable length fields.
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jul 11, 2019, at 20:52, Colin McCabe wrote:
> > > > > > > > > On Tue, Jul 9, 2019, at 15:29, Jose Armando Garcia Sancio 
> > > > > > > > > wrote:
> > > > > > > > > > Thanks Colin for the KIP. For my own edification why are we 
> > > > > > > > > > doing
> > > > > > this
> > > > > > > > > > "Optional fields can have any type, except for an array of
> > > > > > > > structures."?
> > > > > > > > > > Why can't we have an array of structures?
> > > > > > > > >
> > > > > > > > > Optional fields are serialized starting with their total 
> > > > > > > > > length.
> > > > > This
> > > > > > > > > is straightforward to calculate for primitive fields like 
> > > > > > > > > INT32,
> > > > > (or
> > > > > > > > > even an array of INT32), but more difficult to calculate for 
> > > > > > > > > an
> > > > > array
> > > > > > > > > of structures. Basically, we'd have to do a two-pass 
> > > > > > > > > serialization
> > > > > > > > > where we first calculate the lengths of everything, and then 
> > > > > > > > > write
> > > > > it
> > > > > > > > > out.
> > > > > > > > >
> > > > > > > > > The nice thing about this KIP is that there's nothing in the
> > > > > protocol
> > > > > > > > > stopping us from adding support for this feature in the 
> > > > > > > > > future. We
> > > > > > > > > wouldn't have to really change the protocol at all to add 
> > > > > > > > > support.
> > > > > > But
> > > > > > > > > we'd have to change a lot of serialization code. Given almost 
> > > > > > > > > all
> > > > > of
> > > > > > > > > our use-cases for optional fields are adding an extra field 
> > > > > > > > > here or
> > > > > > > > > there, it seems reasonable not to support it for right now.
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -Jose
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
>

Re: [DISCUSS] KIP-482: The Kafka Protocol should Support Optional Fields

Reply via email to