Re: [DISCUSS] KIP-82 - Add Record Headers

Gwen Shapira Thu, 01 Dec 2016 18:21:30 -0800

Based on your last sentence, consider me convinced :)

I get why headers are critical for Mirroring (you need tags to prevent
loops and sometimes to route messages to the correct destination).
But why do you need headers to audit? We are auditing by producing
counts to a side topic (and I was under the impression you do the
same), so we never need to modify the message.


Another thing - after we added headers, wouldn't you be in the
business of making sure everyone uses them properly? Making sure
everyone includes the right headers you need, not using the header
names you intend to use, etc. I don't think the "policing" business
will ever go away.

On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino <[email protected]> wrote:
> Got it. As an ops guy, I'm not very happy with the workaround. Avro means
> that I have to be concerned with the format of the messages in order to run
> the infrastructure (audit, mirroring, etc.). That means that I have to
> handle the schemas, and I have to enforce rules about good formats. This is
> not something I want to be in the business of, because I should be able to
> run a service infrastructure without needing to be in the weeds of dealing
> with customer data formats.
>
> Trust me, a sizable portion of my support time is spent dealing with schema
> issues. I really would like to get away from that. Maybe I'd have more time
> for other hobbies. Like writing. ;)
>
> -Todd
>
> On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <[email protected]> wrote:
>
>> I'm pretty satisfied with the current workarounds (Avro container
>> format), so I'm not too excited about the extra work required to do
>> headers in Kafka. I absolutely don't mind it if you do it...
>> I think the Apache convention for "good idea, but not willing to put
>> any work toward it" is +0.5? anyway, that's what I was trying to
>> convey :)
>>
>> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <[email protected]> wrote:
>> > Well I guess my question for you, then, is what is holding you back from
>> > full support for headers? What’s the bit that you’re missing that has you
>> > under a full +1?
>> >
>> > -Todd
>> >
>> >
>> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <[email protected]> wrote:
>> >
>> >> I know why people who support headers support them, and I've seen what
>> >> the discussion is like.
>> >>
>> >> This is why I'm asking people who are against headers (especially
>> >> committers) what will make them change their mind - so we can get this
>> >> part over one way or another.
>> >>
>> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am
>> >> just looking for something concrete we can do to move the discussion
>> >> along to the yummy design details (which is the argument I really am
>> >> looking forward to).
>> >>
>> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <[email protected]> wrote:
>> >> > So, Gwen, to your question (even though I’m not a committer)...
>> >> >
>> >> > I have always been a strong supporter of introducing the concept of an
>> >> > envelope to messages, which headers accomplishes. The message key is
>> >> > already an example of a piece of envelope information. By providing a
>> >> means
>> >> > to do this within Kafka itself, and not relying on use-case specific
>> >> > implementations, you make it much easier for components to
>> interoperate.
>> >> It
>> >> > simplifies development of all these things (message routing, auditing,
>> >> > encryption, etc.) because each one does not have to reinvent the
>> wheel.
>> >> >
>> >> > It also makes it much easier from a client point of view if the
>> headers
>> >> are
>> >> > defined as part of the protocol and/or message format in general
>> because
>> >> > you can easily produce and consume messages without having to take
>> into
>> >> > account specific cases. For example, I want to route messages, but
>> >> client A
>> >> > doesn’t support the way audit implemented headers, and client B
>> doesn’t
>> >> > support the way encryption or routing implemented headers, so now my
>> >> > application has to create some really fragile (my autocorrect just
>> tried
>> >> to
>> >> > make that “tragic”, which is probably appropriate too) code to strip
>> >> > everything off, rather than just consuming the messages, picking out
>> the
>> >> 1
>> >> > or 2 headers it’s interested in, and performing its function.
>> >> >
>> >> > Honestly, this discussion has been going on for a long time, and it’s
>> >> > always “Oh, you came up with 2 use cases, and yeah, those use cases
>> are
>> >> > real things that someone would want to do. Here’s an alternate way to
>> >> > implement them so let’s not do headers.” If we have a few use cases
>> that
>> >> we
>> >> > actually came up with, you can be sure that over the next year
>> there’s a
>> >> > dozen others that we didn’t think of that someone would like to do. I
>> >> > really think it’s time to stop rehashing this discussion and instead
>> >> focus
>> >> > on a workable standard that we can adopt.
>> >> >
>> >> > -Todd
>> >> >
>> >> >
>> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <[email protected]>
>> wrote:
>> >> >
>> >> >> C. per message encryption
>> >> >>> One drawback of this approach is that this significantly reduce the
>> >> >>> effectiveness of compression, which happens on a set of serialized
>> >> >>> messages. An alternative is to enable SSL for wire encryption and
>> rely
>> >> on
>> >> >>> the storage system (e.g. LUKS) for at rest encryption.
>> >> >>
>> >> >>
>> >> >> Jun, this is not sufficient. While this does cover the case of
>> removing
>> >> a
>> >> >> drive from the system, it will not satisfy most compliance
>> requirements
>> >> for
>> >> >> encryption of data as whoever has access to the broker itself still
>> has
>> >> >> access to the unencrypted data. For end-to-end encryption you need to
>> >> >> encrypt at the producer, before it enters the system, and decrypt at
>> the
>> >> >> consumer, after it exits the system.
>> >> >>
>> >> >> -Todd
>> >> >>
>> >> >>
>> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai <[email protected]>
>> >> wrote:
>> >> >>
>> >> >>> another big plus of headers in the protocol is that it would enable
>> >> rapid
>> >> >>> iteration on ideas outside of core kafka and would reduce the
>> number of
>> >> >>> future wire format changes required.
>> >> >>>
>> >> >>> a lot of what is currently a KIP represents use cases that are not
>> 100%
>> >> >>> relevant to all users, and some of them require rather invasive wire
>> >> >>> protocol changes. a thing a good recent example of this is kip-98.
>> >> >>> tx-utilizing traffic is expected to be a very small fraction of
>> total
>> >> >>> traffic and yet the changes are invasive.
>> >> >>>
>> >> >>> every such wire format change translates into painful and slow
>> >> adoption of
>> >> >>> new versions.
>> >> >>>
>> >> >>> i think a lot of functionality currently in KIPs could be "spun out"
>> >> and
>> >> >>> implemented as opt-in plugins transmitting data over headers. this
>> >> would
>> >> >>> keep the core wire format stable(r), core codebase smaller, and
>> avoid
>> >> the
>> >> >>> "burden of proof" thats sometimes required to prove a certain
>> feature
>> >> is
>> >> >>> useful enough for a wide-enough audience to warrant a wire format
>> >> change
>> >> >>> and code complexity additions.
>> >> >>>
>> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes and im
>> not
>> >> >>> saying it could have been completely done with headers, but
>> >> exactly-once
>> >> >>> delivery certainly could)
>> >> >>>
>> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira <[email protected]>
>> >> wrote:
>> >> >>>
>> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai <
>> [email protected]>
>> >> >>> wrote:
>> >> >>> > > "For use cases within an organization, one could always use
>> other
>> >> >>> > > approaches such as company-wise containers"
>> >> >>> > > this is what linkedin has traditionally done but there are now
>> >> cases
>> >> >>> > (read
>> >> >>> > > - topics) where this is not acceptable. this makes headers
>> useful
>> >> even
>> >> >>> > > within single orgs for cases where one-container-fits-all cannot
>> >> >>> apply.
>> >> >>> > >
>> >> >>> > > as for the particular use cases listed, i dont want this to
>> devolve
>> >> >>> to a
>> >> >>> > > discussion of particular use cases - i think its enough that
>> some
>> >> of
>> >> >>> them
>> >> >>> >
>> >> >>> > I think a main point of contention is that: We identified few
>> >> >>> > use-cases where headers are useful, do we want Kafka to be a
>> system
>> >> >>> > that supports those use-cases?
>> >> >>> >
>> >> >>> > For example, Jun said:
>> >> >>> > "Not sure how widely useful record-level lineage is though since
>> the
>> >> >>> > overhead could
>> >> >>> > be significant."
>> >> >>> >
>> >> >>> > We know NiFi supports record level lineage. I don't think it was
>> >> >>> > developed for lols, I think it is safe to assume that the NSA
>> needed
>> >> >>> > that functionality. We also know that certain financial institutes
>> >> >>> > need to track tampering with records at a record level and there
>> are
>> >> >>> > federal regulations that absolutely require this.  They also need
>> to
>> >> >>> > prove that routing apps that "touches" the messages and either
>> reads
>> >> >>> > or updates headers couldn't have possibly modified the payload
>> >> itself.
>> >> >>> > They use record level encryption to do that - apps can read and
>> >> >>> > (sometimes) modify headers but can't touch the payload.
>> >> >>> >
>> >> >>> > We can totally say "those are corner cases and not worth adding
>> >> >>> > headers to Kafka for", they should use a different pubsub message
>> for
>> >> >>> > that (Nifi or one of the other 1000 that cater specifically to the
>> >> >>> > financial industry).
>> >> >>> >
>> >> >>> > But this gets us into a catch 22:
>> >> >>> > If we discuss a specific use-case, someone can always say it isn't
>> >> >>> > interesting enough for Kafka. If we discuss more general trends,
>> >> >>> > others can say "well, we are not sure any of them really needs
>> >> headers
>> >> >>> > specifically. This is just hand waving and not interesting.".
>> >> >>> >
>> >> >>> > I think discussing use-cases in specifics is super important to
>> >> decide
>> >> >>> > implementation details for headers (my use-cases lean toward
>> >> numerical
>> >> >>> > keys with namespaces and object values, others differ), but I
>> think
>> >> we
>> >> >>> > need to answer the general "Are we going to have headers" question
>> >> >>> > first.
>> >> >>> >
>> >> >>> > I'd love to hear from the other committers in the discussion:
>> >> >>> > What would it take to convince you that headers in Kafka are a
>> good
>> >> >>> > idea in general, so we can move ahead and try to agree on the
>> >> details?
>> >> >>> >
>> >> >>> > I feel like we keep moving the goal posts and this is truly
>> >> exhausting.
>> >> >>> >
>> >> >>> > For the record, I mildly support adding headers to Kafka (+0.5?).
>> >> >>> > The community can continue to find workarounds to the issue and
>> there
>> >> >>> > are some benefits to keeping the message format and clients
>> simpler.
>> >> >>> > But I see the usefulness of headers to many use-cases and if we
>> can
>> >> >>> > find a good and generally useful way to add it to Kafka, it will
>> make
>> >> >>> > Kafka easier to use for many - worthy goal in my eyes.
>> >> >>> >
>> >> >>> > > are interesting/feasible, but:
>> >> >>> > > A+B. i think there are use cases for polyglot topics.
>> especially if
>> >> >>> kafka
>> >> >>> > > is being used to "trunk" something else.
>> >> >>> > > D. multiple topics would make it harder to write portable
>> consumer
>> >> >>> code.
>> >> >>> > > partition remapping would mess with locality of consumption
>> >> >>> guarantees.
>> >> >>> > > E+F. a use case I see for lineage/metadata is
>> billing/chargeback.
>> >> for
>> >> >>> > that
>> >> >>> > > use case it is not enough to simply record the point of origin,
>> but
>> >> >>> every
>> >> >>> > > replication stop (think mirror maker) must also add a record to
>> >> form a
>> >> >>> > > "transit log".
>> >> >>> > >
>> >> >>> > > as for stream processing on top of kafka - i know samza has a
>> >> metadata
>> >> >>> > map
>> >> >>> > > which they carry around in addition to user values. headers are
>> the
>> >> >>> > perfect
>> >> >>> > > fit for these things.
>> >> >>> > >
>> >> >>> > >
>> >> >>> > >
>> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <[email protected]>
>> wrote:
>> >> >>> > >
>> >> >>> > >> Hi, Michael,
>> >> >>> > >>
>> >> >>> > >> In order to answer the first two questions, it would be helpful
>> >> if we
>> >> >>> > could
>> >> >>> > >> identify 1 or 2 strong use cases for headers in the space for
>> >> >>> > third-party
>> >> >>> > >> vendors. For use cases within an organization, one could always
>> >> use
>> >> >>> > other
>> >> >>> > >> approaches such as company-wise containers to get around w/o
>> >> >>> headers. I
>> >> >>> > >> went through the use cases in the KIP and in Radai's wiki (
>> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+
>> >> >>> > Case+for+Kafka+Headers
>> >> >>> > >> ).
>> >> >>> > >> The following are the ones that that I understand and could be
>> in
>> >> the
>> >> >>> > >> third-party use case category.
>> >> >>> > >>
>> >> >>> > >> A. content-type
>> >> >>> > >> It seems that in general, content-type should be set at the
>> topic
>> >> >>> level.
>> >> >>> > >> Not sure if mixing messages with different content types
>> should be
>> >> >>> > >> encouraged.
>> >> >>> > >>
>> >> >>> > >> B. schema id
>> >> >>> > >> Since the value is mostly useless without schema id, it seems
>> that
>> >> >>> > storing
>> >> >>> > >> the schema id together with serialized bytes in the value is
>> >> better?
>> >> >>> > >>
>> >> >>> > >> C. per message encryption
>> >> >>> > >> One drawback of this approach is that this significantly reduce
>> >> the
>> >> >>> > >> effectiveness of compression, which happens on a set of
>> serialized
>> >> >>> > >> messages. An alternative is to enable SSL for wire encryption
>> and
>> >> >>> rely
>> >> >>> > on
>> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption.
>> >> >>> > >>
>> >> >>> > >> D. cluster ID for mirroring across Kafka clusters
>> >> >>> > >> This is actually interesting. Today, to avoid introducing
>> cycles
>> >> when
>> >> >>> > doing
>> >> >>> > >> mirroring across data centers, one would either have to set up
>> two
>> >> >>> Kafka
>> >> >>> > >> clusters (a local and an aggregate) per data center or rename
>> >> topics.
>> >> >>> > >> Neither is ideal. With headers, the producer could tag each
>> >> message
>> >> >>> with
>> >> >>> > >> the producing cluster ID in the header. MirrorMaker could then
>> >> avoid
>> >> >>> > >> mirroring messages to a cluster if they are tagged with the
>> same
>> >> >>> cluster
>> >> >>> > >> id.
>> >> >>> > >>
>> >> >>> > >> However, an alternative approach is to introduce sth like
>> >> >>> hierarchical
>> >> >>> > >> topic and store messages from different clusters in different
>> >> >>> partitions
>> >> >>> > >> under the same topic. This approach avoids filtering out
>> unneeded
>> >> >>> data
>> >> >>> > and
>> >> >>> > >> makes offset preserving easier to support. It may make
>> compaction
>> >> >>> > trickier
>> >> >>> > >> though since the same key may show up in different partitions.
>> >> >>> > >>
>> >> >>> > >> E. record-level lineage
>> >> >>> > >> For example, a source connector could store in the message the
>> >> >>> metadata
>> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream job
>> >> >>> transforms
>> >> >>> > >> messages from topic A to topic B, the library could include the
>> >> >>> source
>> >> >>> > >> message offset in each of the transformed message in the
>> header.
>> >> Not
>> >> >>> > sure
>> >> >>> > >> how widely useful record-level lineage is though since the
>> >> overhead
>> >> >>> > could
>> >> >>> > >> be significant.
>> >> >>> > >>
>> >> >>> > >> F. auditing metadata
>> >> >>> > >> We could put things like clientId/host/user in the header in
>> each
>> >> >>> > message
>> >> >>> > >> for auditing. These metadata are really at the producer level
>> >> though.
>> >> >>> > So, a
>> >> >>> > >> more efficient way is to only include a "producerId" per
>> message
>> >> and
>> >> >>> > send
>> >> >>> > >> the producerId -> metadata mapping independently. KIP-98 is
>> >> actually
>> >> >>> > >> proposing including such a producerId natively in the message.
>> >> >>> > >>
>> >> >>> > >> So, overall, I not sure that I am fully convinced of the strong
>> >> >>> > third-party
>> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit more
>> to
>> >> make
>> >> >>> > one
>> >> >>> > >> or two really convincing use cases.
>> >> >>> > >>
>> >> >>> > >> Another orthogonal  question is whether header should be
>> exposed
>> >> in
>> >> >>> > stream
>> >> >>> > >> processing systems such Kafka stream, Samza, and Spark
>> streaming.
>> >> >>> > >> Currently, those systems just deal with key/value pairs.
>> Should we
>> >> >>> > expose a
>> >> >>> > >> third thing header there too or somehow map header to key or
>> >> value?
>> >> >>> > >>
>> >> >>> > >> Thanks,
>> >> >>> > >>
>> >> >>> > >> Jun
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <
>> >> >>> [email protected]>
>> >> >>> > >> wrote:
>> >> >>> > >>
>> >> >>> > >> > I assume, that after a period of a week, that there is no
>> >> concerns
>> >> >>> now
>> >> >>> > >> > with points 1, and 2 and now we have agreement that headers
>> are
>> >> >>> useful
>> >> >>> > >> and
>> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this wouldn’t
>> be
>> >> a
>> >> >>> > reason
>> >> >>> > >> to
>> >> >>> > >> > reject.
>> >> >>> > >> >
>> >> >>> > >> > @
>> >> >>> > >> > Ignacio on point 4).
>> >> >>> > >> > I think for purpose of getting this KIP moving past this, we
>> can
>> >> >>> state
>> >> >>> > >> the
>> >> >>> > >> > key will be a 4 bytes space that can will be naturally
>> >> interpreted
>> >> >>> as
>> >> >>> > an
>> >> >>> > >> > Int32 (if namespacing is later wanted you can easily split
>> this
>> >> >>> into
>> >> >>> > two
>> >> >>> > >> > int16 spaces), from the wire protocol implementation this
>> makes
>> >> no
>> >> >>> > >> > difference I don’t believe. Is this reasonable to all?
>> >> >>> > >> >
>> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32 bits.
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> > On 18/11/2016, 20:34, "[email protected] on behalf of
>> >> >>> Ignacio
>> >> >>> > >> > Solis" <[email protected] on behalf of [email protected]
>> >
>> >> >>> wrote:
>> >> >>> > >> >
>> >> >>> > >> >     Summary:
>> >> >>> > >> >
>> >> >>> > >> >     3) Yes - Header value as byte[]
>> >> >>> > >> >
>> >> >>> > >> >     4a) Int,Int - No
>> >> >>> > >> >     4b) Int - Yes
>> >> >>> > >> >     4c) String - Reluctant maybe
>> >> >>> > >> >
>> >> >>> > >> >     5) I believe the header system should take a single
>> int.  I
>> >> >>> think
>> >> >>> > >> > 32bits is
>> >> >>> > >> >     a good size, if you want to interpret this as to 16bit
>> >> numbers
>> >> >>> in
>> >> >>> > the
>> >> >>> > >> > layer
>> >> >>> > >> >     above go right ahead.  If somebody wants to argue for 16
>> >> bits
>> >> >>> or
>> >> >>> > 64
>> >> >>> > >> > bits of
>> >> >>> > >> >     header key space I would listen.
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >     Discussion:
>> >> >>> > >> >     Dividing the key space into sub_key_1 and sub_key_2
>> makes no
>> >> >>> > sense to
>> >> >>> > >> > me at
>> >> >>> > >> >     this layer.  Are we going to start providing APIs to get
>> all
>> >> >>> the
>> >> >>> > >> >     sub_key_1s? or all the sub_key_2s?  If there is no
>> >> >>> distinguishing
>> >> >>> > >> > functions
>> >> >>> > >> >     that are applied to each one then they should be a single
>> >> >>> value.
>> >> >>> > At
>> >> >>> > >> > this
>> >> >>> > >> >     layer all we're doing is equality.
>> >> >>> > >> >     If the above layer wants to interpret this as 2, 3 or
>> more
>> >> >>> values
>> >> >>> > >> > that's a
>> >> >>> > >> >     different question.  I personally think it's all one
>> >> keyspace
>> >> >>> > that is
>> >> >>> > >> >     getting assigned using some structure, but if you want to
>> >> >>> > sub-assign
>> >> >>> > >> > parts
>> >> >>> > >> >     of it then that's fine.
>> >> >>> > >> >
>> >> >>> > >> >     The same discussion applies to strings.  If somebody
>> argued
>> >> for
>> >> >>> > >> > strings,
>> >> >>> > >> >     would we be arguing to divide the strings with dots ('.')
>> >> as a
>> >> >>> > >> > requirement?
>> >> >>> > >> >     Would we want them to give us the different name segments
>> >> >>> > separately?
>> >> >>> > >> >     Would we be performing any actions on this key other than
>> >> >>> > matching?
>> >> >>> > >> >
>> >> >>> > >> >     Nacho
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> >     On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <
>> >> >>> > >> [email protected]
>> >> >>> > >> > >
>> >> >>> > >> >     wrote:
>> >> >>> > >> >
>> >> >>> > >> >     > #jay #jun any concerns on 1 and 2 still?
>> >> >>> > >> >     >
>> >> >>> > >> >     > @all
>> >> >>> > >> >     > To get this moving along a bit more I'd also like to
>> ask
>> >> to
>> >> >>> get
>> >> >>> > >> > clarity on
>> >> >>> > >> >     > the below last points:
>> >> >>> > >> >     >
>> >> >>> > >> >     > 3) I believe we're all roughly happy with the header
>> value
>> >> >>> > being a
>> >> >>> > >> > byte[]?
>> >> >>> > >> >     >
>> >> >>> > >> >     > 4) I believe consensus has been for an namespace based
>> int
>> >> >>> > approach
>> >> >>> > >> >     > {int,int} for the key. Any objections if this is what
>> we
>> >> go
>> >> >>> > with?
>> >> >>> > >> >     >
>> >> >>> > >> >     > 5) as we have if assumption in (4)  is correct,
>> {int,int}
>> >> >>> keys.
>> >> >>> > >> >     > Should both int's be int16 or int32?
>> >> >>> > >> >     > I'm for them being int16(2 bytes) as combined is space
>> of
>> >> >>> > 4bytes as
>> >> >>> > >> > per
>> >> >>> > >> >     > original and gives plenty of combinations for the
>> >> >>> foreseeable,
>> >> >>> > and
>> >> >>> > >> > keeps
>> >> >>> > >> >     > the overhead small.
>> >> >>> > >> >     >
>> >> >>> > >> >     > Do we see any benefit in another kip call to discuss
>> >> these at
>> >> >>> > all?
>> >> >>> > >> >     >
>> >> >>> > >> >     > Cheers
>> >> >>> > >> >     > Mike
>> >> >>> > >> >     > ________________________________________
>> >> >>> > >> >     > From: K Burstev <[email protected]>
>> >> >>> > >> >     > Sent: Friday, November 18, 2016 7:07:07 AM
>> >> >>> > >> >     > To: [email protected]
>> >> >>> > >> >     > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>> >> >>> > >> >     >
>> >> >>> > >> >     > For what it is worth also i agree. As a user:
>> >> >>> > >> >     >
>> >> >>> > >> >     >  1) Yes - Headers are worthwhile
>> >> >>> > >> >     >  2) Yes - Headers should be a top level option
>> >> >>> > >> >     >
>> >> >>> > >> >     > 14.11.2016, 21:15, "Ignacio Solis" <[email protected]>:
>> >> >>> > >> >     > > 1) Yes - Headers are worthwhile
>> >> >>> > >> >     > > 2) Yes - Headers should be a top level option
>> >> >>> > >> >     > >
>> >> >>> > >> >     > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <
>> >> >>> > >> > [email protected]>
>> >> >>> > >> >     > > wrote:
>> >> >>> > >> >     > >
>> >> >>> > >> >     > >>  Hi Roger,
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  The kip details/examples the original proposal for
>> key
>> >> >>> > spacing
>> >> >>> > >> ,
>> >> >>> > >> > not
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  new mentioned as per discussion namespace idea.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  We will need to update the kip, when we get
>> agreement
>> >> >>> this
>> >> >>> > is a
>> >> >>> > >> > better
>> >> >>> > >> >     > >>  approach (which seems to be the case if I have
>> >> understood
>> >> >>> > the
>> >> >>> > >> > general
>> >> >>> > >> >     > >>  feeling in the conversation)
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Re the variable ints, at very early stage we did
>> think
>> >> >>> about
>> >> >>> > >> > this. I
>> >> >>> > >> >     > think
>> >> >>> > >> >     > >>  the added complexity for the saving isn't worth it.
>> >> I'd
>> >> >>> > rather
>> >> >>> > >> go
>> >> >>> > >> >     > with, if
>> >> >>> > >> >     > >>  we want to reduce overheads and size int16 (2bytes)
>> >> keys
>> >> >>> as
>> >> >>> > it
>> >> >>> > >> > keeps it
>> >> >>> > >> >     > >>  simple.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  On the note of no headers, there is as per the kip
>> as
>> >> we
>> >> >>> > use an
>> >> >>> > >> >     > attribute
>> >> >>> > >> >     > >>  bit to denote if headers are present or not as such
>> >> >>> > provides a
>> >> >>> > >> > zero
>> >> >>> > >> >     > >>  overhead currently if headers are not used.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  I think as radai mentions would be good first if we
>> >> can
>> >> >>> get
>> >> >>> > >> > clarity if
>> >> >>> > >> >     > do
>> >> >>> > >> >     > >>  we now have general consensus that (1) headers are
>> >> >>> > worthwhile
>> >> >>> > >> and
>> >> >>> > >> >     > useful,
>> >> >>> > >> >     > >>  and (2) we want it as a top level entity.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Just to state the obvious i believe (1) headers are
>> >> >>> > worthwhile
>> >> >>> > >> > and (2)
>> >> >>> > >> >     > >>  agree as a top level entity.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Cheers
>> >> >>> > >> >     > >>  Mike
>> >> >>> > >> >     > >>  ________________________________________
>> >> >>> > >> >     > >>  From: Roger Hoover <[email protected]>
>> >> >>> > >> >     > >>  Sent: Wednesday, November 9, 2016 9:10:47 PM
>> >> >>> > >> >     > >>  To: [email protected]
>> >> >>> > >> >     > >>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Sorry for going a little in the weeds but thanks
>> for
>> >> the
>> >> >>> > >> replies
>> >> >>> > >> >     > regarding
>> >> >>> > >> >     > >>  varint.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Agreed that a prefix and {int, int} can be the
>> same.
>> >> It
>> >> >>> > doesn't
>> >> >>> > >> > look
>> >> >>> > >> >     > like
>> >> >>> > >> >     > >>  that's what the KIP is saying the "Open" section.
>> The
>> >> >>> > example
>> >> >>> > >> > shows
>> >> >>> > >> >     > >>  2100001
>> >> >>> > >> >     > >>  for New Relic and 210002 for App Dynamics implying
>> >> that
>> >> >>> the
>> >> >>> > New
>> >> >>> > >> > Relic
>> >> >>> > >> >     > >>  organization will have only a single header id to
>> work
>> >> >>> > with. Or
>> >> >>> > >> > is
>> >> >>> > >> >     > 2100001
>> >> >>> > >> >     > >>  a prefix? The main point of a namespace or prefix
>> is
>> >> to
>> >> >>> > reduce
>> >> >>> > >> > the
>> >> >>> > >> >     > >>  overhead of config mapping or registration
>> depending
>> >> on
>> >> >>> how
>> >> >>> > >> >     > >>  namespaces/prefixes are managed.
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Would love to hear more feedback on the
>> higher-level
>> >> >>> > questions
>> >> >>> > >> >     > though...
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Cheers,
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  Roger
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  On Wed, Nov 9, 2016 at 11:38 AM, radai <
>> >> >>> > >> > [email protected]>
>> >> >>> > >> >     > wrote:
>> >> >>> > >> >     > >>
>> >> >>> > >> >     > >>  > I think this discussion is getting a bit into the
>> >> >>> weeds on
>> >> >>> > >> > technical
>> >> >>> > >> >     > >>  > implementation details.
>> >> >>> > >> >     > >>  > I'd liek to step back a minute and try and
>> establish
>> >> >>> > where we
>> >> >>> > >> > are in
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  > larger picture:
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > (re-wording nacho's last paragraph)
>> >> >>> > >> >     > >>  > 1. are we all in agreement that headers are a
>> >> >>> worthwhile
>> >> >>> > and
>> >> >>> > >> > useful
>> >> >>> > >> >     > >>  > addition to have? this was contested early on
>> >> >>> > >> >     > >>  > 2. are we all in agreement on headers as top
>> level
>> >> >>> entity
>> >> >>> > vs
>> >> >>> > >> > headers
>> >> >>> > >> >     > >>  > squirreled-away in V?
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > if there are still concerns around these #2
>> points
>> >> >>> (#jay?
>> >> >>> > >> > #jun?)?
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > (and now back to our normal programming ...)
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > varints are nice. having said that, its adding
>> >> >>> complexity
>> >> >>> > >> (see
>> >> >>> > >> >     > >>  > https://github.com/addthis/
>> >> stream-lib/blob/master/src/
>> >> >>> > >> >     > >>  > main/java/com/clearspring/
>> >> analytics/util/Varint.java
>> >> >>> > >> >     > >>  > as 1st google result) and would require anyone
>> >> writing
>> >> >>> > other
>> >> >>> > >> > clients
>> >> >>> > >> >     > (C?
>> >> >>> > >> >     > >>  > Python? Go? Bash? ;-) ) to get/implement the
>> same,
>> >> and
>> >> >>> for
>> >> >>> > >> > relatively
>> >> >>> > >> >     > >>  > little gain (int vs string is order of magnitude,
>> >> this
>> >> >>> > isnt).
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > int namespacing vs {int, int} namespacing are
>> >> basically
>> >> >>> > the
>> >> >>> > >> > same
>> >> >>> > >> >     > thing -
>> >> >>> > >> >     > >>  > youre just namespacing an int64 and giving people
>> >> while
>> >> >>> > 2^32
>> >> >>> > >> > ranges
>> >> >>> > >> >     > at a
>> >> >>> > >> >     > >>  > time. the part i like about this is letting
>> people
>> >> >>> have a
>> >> >>> > >> large
>> >> >>> > >> >     > swath of
>> >> >>> > >> >     > >>  > numbers with one registration so they dont have
>> to
>> >> come
>> >> >>> > back
>> >> >>> > >> > for
>> >> >>> > >> >     > every
>> >> >>> > >> >     > >>  > single plugin/header they want to "reserve".
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <
>> >> >>> > >> >     > [email protected]>
>> >> >>> > >> >     > >>  > wrote:
>> >> >>> > >> >     > >>  >
>> >> >>> > >> >     > >>  > > Since some of the debate has been about
>> overhead +
>> >> >>> > >> > performance, I'm
>> >> >>> > >> >     > >>  > > wondering if we have considered a varint
>> encoding
>> >> (
>> >> >>> > >> >     > >>  > > https://developers.google.com/
>> >> protocol-buffers/docs/
>> >> >>> > >> >     > encoding#varints)
>> >> >>> > >> >     > >>  > for
>> >> >>> > >> >     > >>  > > the header length field (int32 in the proposal)
>> >> and
>> >> >>> for
>> >> >>> > >> > header
>> >> >>> > >> >     > ids? If
>> >> >>> > >> >     > >>  > you
>> >> >>> > >> >     > >>  > > don't use headers, the overhead would be a
>> single
>> >> >>> byte
>> >> >>> > and
>> >> >>> > >> > for each
>> >> >>> > >> >     > >>  > header
>> >> >>> > >> >     > >>  > > id < 128 would also need only a single byte?
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > > On Wed, Nov 9, 2016 at 6:43 AM, radai <
>> >> >>> > >> > [email protected]>
>> >> >>> > >> >     > >>  > wrote:
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > > > @magnus - and very dangerous (youre
>> essentially
>> >> >>> > >> > downloading and
>> >> >>> > >> >     > >>  > executing
>> >> >>> > >> >     > >>  > > > arbitrary code off the internet on your
>> servers
>> >> ...
>> >> >>> > bad
>> >> >>> > >> > idea
>> >> >>> > >> >     > without
>> >> >>> > >> >     > >>  a
>> >> >>> > >> >     > >>  > > > sandbox, even with)
>> >> >>> > >> >     > >>  > > >
>> >> >>> > >> >     > >>  > > > as for it being a purely administrative task
>> - i
>> >> >>> > >> disagree.
>> >> >>> > >> >     > >>  > > >
>> >> >>> > >> >     > >>  > > > i wish it would, really, because then my
>> earlier
>> >> >>> > point on
>> >> >>> > >> > the
>> >> >>> > >> >     > >>  > complexity
>> >> >>> > >> >     > >>  > > of
>> >> >>> > >> >     > >>  > > > the remapping process would be invalid, but
>> at
>> >> >>> > linkedin,
>> >> >>> > >> > for
>> >> >>> > >> >     > example,
>> >> >>> > >> >     > >>  > we
>> >> >>> > >> >     > >>  > > > (the team im in) run kafka as a service. we
>> dont
>> >> >>> > really
>> >> >>> > >> > know
>> >> >>> > >> >     > what our
>> >> >>> > >> >     > >>  > > users
>> >> >>> > >> >     > >>  > > > (developing applications that use kafka) are
>> up
>> >> to
>> >> >>> at
>> >> >>> > any
>> >> >>> > >> > given
>> >> >>> > >> >     > >>  moment.
>> >> >>> > >> >     > >>  > > it
>> >> >>> > >> >     > >>  > > > is very possible (given the existance of
>> headers
>> >> >>> and a
>> >> >>> > >> >     > corresponding
>> >> >>> > >> >     > >>  > > plugin
>> >> >>> > >> >     > >>  > > > ecosystem) for some application to "equip"
>> their
>> >> >>> > >> producers
>> >> >>> > >> > and
>> >> >>> > >> >     > >>  > consumers
>> >> >>> > >> >     > >>  > > > with the required plugin without us knowing.
>> i
>> >> dont
>> >> >>> > mean
>> >> >>> > >> > to imply
>> >> >>> > >> >     > >>  thats
>> >> >>> > >> >     > >>  > > > bad, i just want to make the point that its
>> not
>> >> as
>> >> >>> > simple
>> >> >>> > >> >     > keeping it
>> >> >>> > >> >     > >>  in
>> >> >>> > >> >     > >>  > > > sync across a large-enough organization.
>> >> >>> > >> >     > >>  > > >
>> >> >>> > >> >     > >>  > > >
>> >> >>> > >> >     > >>  > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus
>> Edenhill
>> >> <
>> >> >>> > >> >     > [email protected]>
>> >> >>> > >> >     > >>  > > > wrote:
>> >> >>> > >> >     > >>  > > >
>> >> >>> > >> >     > >>  > > > > I think there is a piece missing in the
>> >> Strings
>> >> >>> > >> > discussion,
>> >> >>> > >> >     > where
>> >> >>> > >> >     > >>  > > > > pro-Stringers
>> >> >>> > >> >     > >>  > > > > reason that by providing unique string
>> >> >>> identifiers
>> >> >>> > for
>> >> >>> > >> > each
>> >> >>> > >> >     > header
>> >> >>> > >> >     > >>  > > > > everything will just
>> >> >>> > >> >     > >>  > > > > magically work for all parts of the stream
>> >> >>> pipeline.
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > But the strings dont mean anything by
>> >> themselves,
>> >> >>> > and
>> >> >>> > >> > while we
>> >> >>> > >> >     > >>  could
>> >> >>> > >> >     > >>  > > > > probably envision
>> >> >>> > >> >     > >>  > > > > some auto plugin loader that downloads,
>> >> compiles,
>> >> >>> > links
>> >> >>> > >> > and
>> >> >>> > >> >     > runs
>> >> >>> > >> >     > >>  > > plugins
>> >> >>> > >> >     > >>  > > > > on-demand
>> >> >>> > >> >     > >>  > > > > as soon as they're seen by a consumer, I
>> dont
>> >> >>> really
>> >> >>> > >> see
>> >> >>> > >> > a
>> >> >>> > >> >     > use-case
>> >> >>> > >> >     > >>  > for
>> >> >>> > >> >     > >>  > > > > something
>> >> >>> > >> >     > >>  > > > > so dynamic (and fragile) in practice.
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > In the real world an application will be
>> >> >>> configured
>> >> >>> > >> with
>> >> >>> > >> > a set
>> >> >>> > >> >     > of
>> >> >>> > >> >     > >>  > > plugins
>> >> >>> > >> >     > >>  > > > > to either add (producer)
>> >> >>> > >> >     > >>  > > > > or read (consumer) headers.
>> >> >>> > >> >     > >>  > > > > This is an administrative task based on
>> what
>> >> >>> > features a
>> >> >>> > >> > client
>> >> >>> > >> >     > >>  > > > > needs/provides and results in
>> >> >>> > >> >     > >>  > > > > some sort of configuration to enable and
>> >> >>> configure
>> >> >>> > the
>> >> >>> > >> > desired
>> >> >>> > >> >     > >>  > plugins.
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > Since this needs to be kept somewhat in
>> sync
>> >> >>> across
>> >> >>> > an
>> >> >>> > >> >     > organisation
>> >> >>> > >> >     > >>  > > > (there
>> >> >>> > >> >     > >>  > > > > is no point in having producers
>> >> >>> > >> >     > >>  > > > > add headers no consumers will read, and
>> vice
>> >> >>> versa),
>> >> >>> > >> the
>> >> >>> > >> > added
>> >> >>> > >> >     > >>  > > complexity
>> >> >>> > >> >     > >>  > > > > of assigning an id namespace
>> >> >>> > >> >     > >>  > > > > for each plugin as it is being configured
>> >> should
>> >> >>> be
>> >> >>> > >> > tolerable.
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > /Magnus
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <
>> >> >>> > >> >     > [email protected]>:
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > > Just following/catching up on what seems
>> to
>> >> be
>> >> >>> an
>> >> >>> > >> > active
>> >> >>> > >> >     > night :)
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > @Radai sorry if it may seem obvious but
>> what
>> >> >>> does
>> >> >>> > MD
>> >> >>> > >> > stand
>> >> >>> > >> >     > for?
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > My take on String vs Int:
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > I will state first I am pro Int (16 or
>> 32).
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > I do though playing devils advocate see a
>> >> big
>> >> >>> plus
>> >> >>> > >> > with the
>> >> >>> > >> >     > >>  > argument
>> >> >>> > >> >     > >>  > > of
>> >> >>> > >> >     > >>  > > > > > String keys, this is around integrating
>> >> into an
>> >> >>> > >> > existing
>> >> >>> > >> >     > >>  > eco-system.
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > As many other systems use String based
>> >> headers
>> >> >>> > >> (Flume,
>> >> >>> > >> > JMS)
>> >> >>> > >> >     > it
>> >> >>> > >> >     > >>  > makes
>> >> >>> > >> >     > >>  > > > it
>> >> >>> > >> >     > >>  > > > > > much easier for these to be
>> >> >>> > incorporated/integrated
>> >> >>> > >> > into.
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > How with Int based headers could we
>> provide
>> >> a
>> >> >>> > >> > way/guidence to
>> >> >>> > >> >     > >>  make
>> >> >>> > >> >     > >>  > > this
>> >> >>> > >> >     > >>  > > > > > integration simple / easy with transition
>> >> flows
>> >> >>> > over
>> >> >>> > >> to
>> >> >>> > >> >     > kafka?
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > * tough luck buddy you're on your own
>> >> >>> > >> >     > >>  > > > > > * simply hash the string into int code
>> and
>> >> hope
>> >> >>> > for
>> >> >>> > >> no
>> >> >>> > >> >     > collisions
>> >> >>> > >> >     > >>  > > (how
>> >> >>> > >> >     > >>  > > > to
>> >> >>> > >> >     > >>  > > > > > convert back though?)
>> >> >>> > >> >     > >>  > > > > > * http2 style as mentioned by nacho.
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > cheers,
>> >> >>> > >> >     > >>  > > > > > Mike
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > ________________________________________
>> >> >>> > >> >     > >>  > > > > > From: radai <[email protected]>
>> >> >>> > >> >     > >>  > > > > > Sent: Wednesday, November 9, 2016 8:12 AM
>> >> >>> > >> >     > >>  > > > > > To: [email protected]
>> >> >>> > >> >     > >>  > > > > > Subject: Re: [DISCUSS] KIP-82 - Add
>> Record
>> >> >>> Headers
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > thinking about it some more, the best
>> way to
>> >> >>> > transmit
>> >> >>> > >> > the
>> >> >>> > >> >     > header
>> >> >>> > >> >     > >>  > > > > remapping
>> >> >>> > >> >     > >>  > > > > > data to consumers would be to put it in
>> the
>> >> MD
>> >> >>> > >> response
>> >> >>> > >> >     > payload,
>> >> >>> > >> >     > >>  so
>> >> >>> > >> >     > >>  > > > maybe
>> >> >>> > >> >     > >>  > > > > > it should be discussed now.
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai <
>> >> >>> > >> >     > >>  [email protected]
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > > > > wrote:
>> >> >>> > >> >     > >>  > > > > >
>> >> >>> > >> >     > >>  > > > > > > im not opposed to the idea of namespace
>> >> >>> mapping.
>> >> >>> > >> all
>> >> >>> > >> > im
>> >> >>> > >> >     > saying
>> >> >>> > >> >     > >>  is
>> >> >>> > >> >     > >>  > > > that
>> >> >>> > >> >     > >>  > > > > > its
>> >> >>> > >> >     > >>  > > > > > > not part of the "mvp" and, since it
>> >> requires
>> >> >>> no
>> >> >>> > >> wire
>> >> >>> > >> > format
>> >> >>> > >> >     > >>  > change,
>> >> >>> > >> >     > >>  > > > can
>> >> >>> > >> >     > >>  > > > > > > always be added later.
>> >> >>> > >> >     > >>  > > > > > > also, its not as simple as just
>> >> configuring
>> >> >>> MM
>> >> >>> > to
>> >> >>> > >> do
>> >> >>> > >> > the
>> >> >>> > >> >     > >>  > transform:
>> >> >>> > >> >     > >>  > > > > lets
>> >> >>> > >> >     > >>  > > > > > > say i've implemented large message
>> >> support as
>> >> >>> > >> > {666,1} and
>> >> >>> > >> >     > on
>> >> >>> > >> >     > >>  some
>> >> >>> > >> >     > >>  > > > > mirror
>> >> >>> > >> >     > >>  > > > > > > target cluster its been remapped to
>> >> {999,1}.
>> >> >>> the
>> >> >>> > >> > consumer
>> >> >>> > >> >     > >>  plugin
>> >> >>> > >> >     > >>  > > code
>> >> >>> > >> >     > >>  > > > > > would
>> >> >>> > >> >     > >>  > > > > > > also need to be told to look for the
>> large
>> >> >>> > message
>> >> >>> > >> > "part X
>> >> >>> > >> >     > of
>> >> >>> > >> >     > >>  Y"
>> >> >>> > >> >     > >>  > > > header
>> >> >>> > >> >     > >>  > > > > > > under {999,1}. doable, but tricky.
>> >> >>> > >> >     > >>  > > > > > >
>> >> >>> > >> >     > >>  > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen
>> >> >>> Shapira <
>> >> >>> > >> >     > >>  [email protected]
>> >> >>> > >> >     > >>  > >
>> >> >>> > >> >     > >>  > > > > wrote:
>> >> >>> > >> >     > >>  > > > > > >
>> >> >>> > >> >     > >>  > > > > > >> While you can do whatever you want
>> with a
>> >> >>> > >> namespace
>> >> >>> > >> > and
>> >> >>> > >> >     > your
>> >> >>> > >> >     > >>  > code,
>> >> >>> > >> >     > >>  > > > > > >> what I'd expect is for each app to
>> >> >>> namespaces
>> >> >>> > >> >     > configurable...
>> >> >>> > >> >     > >>  > > > > > >>
>> >> >>> > >> >     > >>  > > > > > >> So if I accidentally used 666 for my
>> HR
>> >> >>> > >> department,
>> >> >>> > >> > and
>> >> >>> > >> >     > still
>> >> >>> > >> >     > >>  > want
>> >> >>> > >> >     > >>  > > > to
>> >> >>> > >> >     > >>  > > > > > >> run RadaiApp, I can config
>> "namespace=42"
>> >> >>> for
>> >> >>> > >> > RadaiApp and
>> >> >>> > >> >     > >>  > > > everything
>> >> >>> > >> >     > >>  > > > > > >> will look normal.
>> >> >>> > >> >     > >>  > > > > > >>
>> >> >>> > >> >     > >>  > > > > > >> This means you only need to sync usage
>> >> >>> inside
>> >> >>> > your
>> >> >>> > >> > own
>> >> >>> > >> >     > >>  > > organization.
>> >> >>> > >> >     > >>  > > > > > >> Still hard, but somewhat easier than
>> >> syncing
>> >> >>> > with
>> >> >>> > >> > the
>> >> >>> > >> >     > entire
>> >> >>> > >> >     > >>  > > world.
>> >> >>> > >> >     > >>  > > > > > >>
>> >> >>> > >> >     > >>  > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM,
>> radai <
>> >> >>> > >> >     > >>  > > [email protected]>
>> >> >>> > >> >     > >>  > > > > > >> wrote:
>> >> >>> > >> >     > >>  > > > > > >> > and we can start with {namespace,
>> id}
>> >> and
>> >> >>> no
>> >> >>> > >> > re-mapping
>> >> >>> > >> >     > >>  > support
>> >> >>> > >> >     > >>  > > > and
>> >> >>> > >> >     > >>  > > > > > >> always
>> >> >>> > >> >     > >>  > > > > > >> > add it later on if/when collisions
>> >> >>> actually
>> >> >>> > >> > happen (i
>> >> >>> > >> >     > dont
>> >> >>> > >> >     > >>  > think
>> >> >>> > >> >     > >>  > > > > > they'd
>> >> >>> > >> >     > >>  > > > > > >> be
>> >> >>> > >> >     > >>  > > > > > >> > a problem).
>> >> >>> > >> >     > >>  > > > > > >> >
>> >> >>> > >> >     > >>  > > > > > >> > every interested party (so orgs or
>> >> >>> > individuals)
>> >> >>> > >> > could
>> >> >>> > >> >     > then
>> >> >>> > >> >     > >>  > > > register
>> >> >>> > >> >     > >>  > > > > a
>> >> >>> > >> >     > >>  > > > > > >> > prefix (0 = reserved, 1 = confluent
>> ...
>> >> >>> 666
>> >> >>> > = me
>> >> >>> > >> > :-) )
>> >> >>> > >> >     > and
>> >> >>> > >> >     > >>  do
>> >> >>> > >> >     > >>  > > > > whatever
>> >> >>> > >> >     > >>  > > > > > >> with
>> >> >>> > >> >     > >>  > > > > > >> > the 2nd ID - so once linkedin
>> >> registers,
>> >> >>> say
>> >> >>> > 3,
>> >> >>> > >> > then
>> >> >>> > >> >     > >>  linkedin
>> >> >>> > >> >     > >>  > > devs
>> >> >>> > >> >     > >>  > > > > are
>> >> >>> > >> >     > >>  > > > > > >> free
>> >> >>> > >> >     > >>  > > > > > >> > to use {3, *} with a reasonable
>> >> >>> expectation
>> >> >>> > to
>> >> >>> > >> to
>> >> >>> > >> >     > collide
>> >> >>> > >> >     > >>  with
>> >> >>> > >> >     > >>  > > > > > anything
>> >> >>> > >> >     > >>  > > > > > >> > else. further partitioning of that *
>> >> >>> becomes
>> >> >>> > >> > linkedin's
>> >> >>> > >> >     > >>  > problem,
>> >> >>> > >> >     > >>  > > > but
>> >> >>> > >> >     > >>  > > > > > the
>> >> >>> > >> >     > >>  > > > > > >> > "upstream registration" of a
>> namespace
>> >> >>> only
>> >> >>> > has
>> >> >>> > >> to
>> >> >>> > >> >     > happen
>> >> >>> > >> >     > >>  > once.
>> >> >>> > >> >     > >>  > > > > > >> >
>> >> >>> > >> >     > >>  > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM,
>> James
>> >> >>> Cheng <
>> >> >>> > >> >     > >>  > > [email protected]
>> >> >>> > >> >     > >>  > > > >
>> >> >>> > >> >     > >>  > > > > > >> wrote:
>> >> >>> > >> >     > >>  > > > > > >> >
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen
>> >> >>> Shapira <
>> >> >>> > >> >     > >>  > [email protected]>
>> >> >>> > >> >     > >>  > > > > > wrote:
>> >> >>> > >> >     > >>  > > > > > >> >> >
>> >> >>> > >> >     > >>  > > > > > >> >> > Thank you so much for this clear
>> and
>> >> >>> fair
>> >> >>> > >> > summary of
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  > > > > arguments.
>> >> >>> > >> >     > >>  > > > > > >> >> >
>> >> >>> > >> >     > >>  > > > > > >> >> > I'm in favor of ints. Not a
>> >> >>> deal-breaker,
>> >> >>> > but
>> >> >>> > >> > in
>> >> >>> > >> >     > favor.
>> >> >>> > >> >     > >>  > > > > > >> >> >
>> >> >>> > >> >     > >>  > > > > > >> >> > Even more in favor of Magnus's
>> >> >>> > decentralized
>> >> >>> > >> >     > suggestion
>> >> >>> > >> >     > >>  > with
>> >> >>> > >> >     > >>  > > > > > Roger's
>> >> >>> > >> >     > >>  > > > > > >> >> > tweak: add a namespace for
>> headers.
>> >> >>> This
>> >> >>> > will
>> >> >>> > >> > allow
>> >> >>> > >> >     > each
>> >> >>> > >> >     > >>  > app
>> >> >>> > >> >     > >>  > > to
>> >> >>> > >> >     > >>  > > > > > just
>> >> >>> > >> >     > >>  > > > > > >> >> > use whatever IDs it wants
>> >> internally,
>> >> >>> and
>> >> >>> > >> then
>> >> >>> > >> > let
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  > admin
>> >> >>> > >> >     > >>  > > > > > >> deploying
>> >> >>> > >> >     > >>  > > > > > >> >> > the app figure out an available
>> >> >>> namespace
>> >> >>> > ID
>> >> >>> > >> > for the
>> >> >>> > >> >     > app
>> >> >>> > >> >     > >>  to
>> >> >>> > >> >     > >>  > > > live
>> >> >>> > >> >     > >>  > > > > > in.
>> >> >>> > >> >     > >>  > > > > > >> >> > So io.confluent.schema-registry
>> can
>> >> be
>> >> >>> > >> > namespace
>> >> >>> > >> >     > 0x01 on
>> >> >>> > >> >     > >>  my
>> >> >>> > >> >     > >>  > > > > > >> deployment
>> >> >>> > >> >     > >>  > > > > > >> >> > and 0x57 on yours, and the poor
>> guys
>> >> >>> > >> > developing the
>> >> >>> > >> >     > app
>> >> >>> > >> >     > >>  > don't
>> >> >>> > >> >     > >>  > > > > need
>> >> >>> > >> >     > >>  > > > > > to
>> >> >>> > >> >     > >>  > > > > > >> >> > worry about that.
>> >> >>> > >> >     > >>  > > > > > >> >> >
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> Gwen, if I understand your example
>> >> >>> right, an
>> >> >>> > >> >     > application
>> >> >>> > >> >     > >>  > > deployer
>> >> >>> > >> >     > >>  > > > > > might
>> >> >>> > >> >     > >>  > > > > > >> >> decide to use 0x01 in one
>> deployment,
>> >> and
>> >> >>> > that
>> >> >>> > >> > means
>> >> >>> > >> >     > that
>> >> >>> > >> >     > >>  > once
>> >> >>> > >> >     > >>  > > > the
>> >> >>> > >> >     > >>  > > > > > >> message
>> >> >>> > >> >     > >>  > > > > > >> >> is written into the broker, it
>> will be
>> >> >>> > saved on
>> >> >>> > >> > the
>> >> >>> > >> >     > broker
>> >> >>> > >> >     > >>  > with
>> >> >>> > >> >     > >>  > > > > that
>> >> >>> > >> >     > >>  > > > > > >> >> specific namespace (0x01).
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> If you were to mirror that message
>> >> into
>> >> >>> > another
>> >> >>> > >> >     > cluster,
>> >> >>> > >> >     > >>  the
>> >> >>> > >> >     > >>  > > 0x01
>> >> >>> > >> >     > >>  > > > > > would
>> >> >>> > >> >     > >>  > > > > > >> >> accompany the message, right? What
>> if
>> >> the
>> >> >>> > >> > deployers of
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  > same
>> >> >>> > >> >     > >>  > > > app
>> >> >>> > >> >     > >>  > > > > > in
>> >> >>> > >> >     > >>  > > > > > >> the
>> >> >>> > >> >     > >>  > > > > > >> >> other cluster uses 0x57? They won't
>> >> >>> > understand
>> >> >>> > >> > each
>> >> >>> > >> >     > other?
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> I'm not sure that's an avoidable
>> >> >>> problem. I
>> >> >>> > >> > think it
>> >> >>> > >> >     > simply
>> >> >>> > >> >     > >>  > > means
>> >> >>> > >> >     > >>  > > > > > that
>> >> >>> > >> >     > >>  > > > > > >> in
>> >> >>> > >> >     > >>  > > > > > >> >> order to share data, you have to
>> also
>> >> >>> have a
>> >> >>> > >> > shared
>> >> >>> > >> >     > (agreed
>> >> >>> > >> >     > >>  > > upon)
>> >> >>> > >> >     > >>  > > > > > >> >> understanding of what the
>> namespaces
>> >> >>> mean.
>> >> >>> > >> Which
>> >> >>> > >> > I
>> >> >>> > >> >     > think
>> >> >>> > >> >     > >>  > makes
>> >> >>> > >> >     > >>  > > > > sense,
>> >> >>> > >> >     > >>  > > > > > >> >> because the alternate (sharing
>> >> *nothing*
>> >> >>> at
>> >> >>> > >> all)
>> >> >>> > >> > would
>> >> >>> > >> >     > mean
>> >> >>> > >> >     > >>  > > that
>> >> >>> > >> >     > >>  > > > > > there
>> >> >>> > >> >     > >>  > > > > > >> >> would be no way to understand each
>> >> other.
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> -James
>> >> >>> > >> >     > >>  > > > > > >> >>
>> >> >>> > >> >     > >>  > > > > > >> >> > Gwen
>> >> >>> > >> >     > >>  > > > > > >> >> >
>> >> >>> > >> >     > >>  > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM,
>> >> radai <
>> >> >>> > >> >     > >>  > > > > [email protected]>
>> >> >>> > >> >     > >>  > > > > > >> >> wrote:
>> >> >>> > >> >     > >>  > > > > > >> >> >> +1 for sean's document. it
>> covers
>> >> >>> pretty
>> >> >>> > >> much
>> >> >>> > >> > all
>> >> >>> > >> >     > the
>> >> >>> > >> >     > >>  > > > trade-offs
>> >> >>> > >> >     > >>  > > > > > and
>> >> >>> > >> >     > >>  > > > > > >> >> >> provides concrete figures to
>> argue
>> >> >>> about
>> >> >>> > :-)
>> >> >>> > >> >     > >>  > > > > > >> >> >> (nit-picking - used the same
>> xkcd
>> >> >>> twice,
>> >> >>> > >> also
>> >> >>> > >> > trove
>> >> >>> > >> >     > has
>> >> >>> > >> >     > >>  > been
>> >> >>> > >> >     > >>  > > > > > >> superceded
>> >



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to