*concert = convert 

Sent from my iPhone

> On 13 Dec 2017, at 05:35, Michael André Pearce <michael.andre.pea...@me.com> 
> wrote:
> Hi Randall
> What’s the main difference between this and my earlier alternative option PR
> https://github.com/apache/kafka/pull/2942/files
> If none then +1.
> From what I can tell the only difference I make is the headers you support 
> being able to cross convert primitive types eg if value after conversion is 
> integer you can still ask for float and it will type concert if possible.
> Cheers
> Mike
> Sent from my iPhone
>> On 13 Dec 2017, at 01:36, Randall Hauch <rha...@gmail.com> wrote:
>> Trying to revive this after several months of inactivity....
>> I've spent quite a bit of time evaluating the current KIP-145 proposal and
>> several of the suggested PRs. The original KIP-145 proposal is relatively
>> minimalist (which is very nice), and it adopts Kafka's approach to headers
>> where header keys are strings and header values are byte arrays. IMO, this
>> places too much responsibility on the connector developers to know how to
>> serialize and deserialize, which means that it's going to be difficult to
>> assemble into pipelines connectors and stream processors that make
>> different, incompatible assumptions. It also makes Connect headers very
>> different than Connect's keys and values, which are generally structured
>> and describable with Connect schemas. I think we need Connect headers to do
>> more.
>> The other proposals attempt to do more, but even my first proposal doesn't
>> seem to really provide a solution that works for Connect users and
>> connector developers. After looking at this feature from a variety of
>> perspectives over several months, I now assert that Connect must solve two
>> orthogonal problems:
>> 1) Serialization: How different data types are (de)serialized as header
>> values
>> 2) Conversion: How values of one data type are converted to values of
>> another data type
>> For the serialization problem, Ewen suggested quite a while back that we
>> use something akin to `Converter` for header values. Unfortunately we can't
>> directly reuse `Converters` since the method signatures don't allow us to
>> supply the header name and the topic name, but we could define a
>> `HeaderConverter` that is similar to and compatible with `Converter` such
>> that a single class could implement both. This would align Connector
>> headers with how message keys and values are handled. Each connector could
>> define which converter it wants to use; for backward compatibility purposes
>> we use a header converter by default that serialize values to strings. If
>> you want something other than this default, you'd have to specify the
>> header converter options as part of the connector configuration; this
>> proposal changes the `StringConverter`, `ByteArrayConverter`, and
>> `JsonConverter` to all implement `HeaderConverter`, so these are all
>> options. This approach supposes that a connector will serialize all of its
>> headers in the same way -- with string-like representations by default. I
>> think this is a safe assumption for the short term, and if we need more
>> control to (de)serialize named headers differently for the same connector,
>> we can always implement a different `HeaderConverter` that gives users more
>> control.
>> So that would solve the serialization problem. How about connectors and
>> transforms that are implemented to expect a certain type of header value,
>> such as an integer or boolean or timestamp? We could solve this problem
>> (for the most part) by adding methods to the `Header` interface to get the
>> value in the desired type, and to support all of the sensible conversions
>> between Connect's primitives and logical types. So, a connector or
>> transform could always call `header.valueAsObject()` to get the raw
>> representation from the converter, but a connector or transform could also
>> get the string representation by calling `header.valueAsString()`, or the
>> INT64 representation by calling `header.valueAsLong()`, etc. We could even
>> have converting methods for the built-in logical types (e.g.,
>> `header.valueAsTimestamp()` to return a java.util.Date value that is
>> described by Connect's Timestamp logical type). We can convert between most
>> primitive and logical types (e.g., anything to a STRING, INT32 to FLOAT32,
>> etc.), but there are a few that don't make sense (e.g., ARRAY to FLOAT32,
>> INT32 to STRUCT, BYTE_ARRAY to anything, etc.), so these can throw a
>> `DataException`.
>> I've refined this approach over the last few months, and have a PR for a
>> complete prototype that demonstrates these concepts and techniques:
>> https://github.com/apache/kafka/pull/4319
>> This PR does *not* update the documentation, though I can add that if we
>> approve of this approach. And, we probably want to define (at least on the
>> KIP) some relatively obvious SMTs for copying header values into record
>> key/value fields, and extracting record key/value fields into header values.
>> @Michael, would you mind if I edited KIP-145 to reflect this proposal? I
>> would be happy to keep the existing proposal at the end of the document (or
>> remove it if you prefer, since it's already in the page history), and we
>> can revise as we choose a direction.
>> Comments? Thoughts?
>> Best regards,
>> Randall
>> On Thu, Oct 19, 2017 at 2:10 PM, Michael André Pearce <
>> michael.andre.pea...@me.com> wrote:
>>> @rhauch
>>> Here is the previous discussion thread, just reigniting so we can discuss
>>> against the original kip thread
>>> Cheers
>>> Mike
>>> Sent from my iPhone
>>>> On 5 May 2017, at 02:21, Michael Pearce <michael.pea...@ig.com> wrote:
>>>> Hi Ewen,
>>>> Did you get a chance to look at the updated sample showing the idea?
>>>> Did it help?
>>>> Cheers
>>>> Mike
>>>> Sent using OWA for iPhone
>>>> ________________________________________
>>>> From: Michael Pearce <michael.pea...@ig.com>
>>>> Sent: Wednesday, May 3, 2017 10:11:55 AM
>>>> To: dev@kafka.apache.org
>>>> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect
>>>> Hi Ewen,
>>>> As code I think helps, as I don’t think I explained what I meant very
>>> well.
>>>> I have pushed what I was thinking to the branch/pr.
>>>> https://github.com/apache/kafka/pull/2942
>>>> The key bits added on top here are:
>>>> new ConnectHeader that holds the header key (as string) and then header
>>> value object header value schema
>>>> new SubjectConverter which allows exposing a subject, in this case the
>>> subject is the key. - this can be used to register the header type in repos
>>> like schema registry, or in my case below in a property file.
>>>> We can default the subject converter to String based of Byte based where
>>> all header values are treated safely as String or byte[] type.
>>>> But this way you could add in your own converter which could be more
>>> sophisticated and convert the header based on the key.
>>>> The main part is to have access to the key, so you can look up the
>>> header value type, based on the key from somewhere, aka a properties file,
>>> or some central repo (aka schema repo), where the repo subject could be the
>>> topic + key, or just key if key type is global, and the schema could be
>>> primitive, String, byte[] or even can be more elaborate.
>>>> Cheers
>>>> Mike
>>>> On 03/05/2017, 06:00, "Ewen Cheslack-Postava" <e...@confluent.io> wrote:
>>>>  Michael,
>>>>  Aren't JMS headers an example where the variety is a problem? Unless
>>> I'm
>>>>  misunderstanding, there's not even a fixed serialization format
>>> expected
>>>>  for them since JMS defines the runtime types, not the wire format. For
>>>>  example, we have JMSCorrelationID (String), JMSExpires (Long), and
>>>>  JMSReplyTo (Destination). These are simply run time types, so we'd
>>> need
>>>>  either (a) a different serializer/deserializer for each or (b) a
>>>>  serializer/deserializer that can handle all of them (e.g. Avro, JSON,
>>> etc).
>>>>  What is the actual serialized format of the different fields? And if
>>> it's
>>>>  not specified anywhere in the KIP, why should using the well-known
>>> type for
>>>>  the header key (e.g. use StringSerializer, IntSerializer, etc) be
>>> better or
>>>>  worse than using a general serialization format (e.g. Avro, JSON)?
>>> And if
>>>>  the latter is the choice, how do you decide on the format?
>>>>  -Ewen
>>>>  On Tue, May 2, 2017 at 12:48 PM, Michael André Pearce <
>>>>  michael.andre.pea...@me.com> wrote:
>>>>> Hi Ewan,
>>>>> So on the point of JMS the predefined/standardised JMS and JMSX headers
>>>>> have predefined types. So these can be serialised/deserialised
>>> accordingly.
>>>>> Custom jms headers agreed could be a bit more difficult but on the 80/20
>>>>> rule I would agree mostly they're string values and as anyhow you can
>>> hold
>>>>> bytes as a string it wouldn't cause any issue, defaulting to that.
>>>>> But I think easily we maybe able to do one better.
>>>>> Obviously can override the/config the headers converter but we can
>>> supply
>>>>> a default converter could take a config file with key to type mapping?
>>>>> Allowing people to maybe define/declare a header key with the expected
>>>>> type in some property file? To support string, byte[] and primitives?
>>> And
>>>>> undefined headers just either default to String or byte[]
>>>>> We could also pre define known headers like the jms ones mentioned
>>> above.
>>>>> E.g
>>>>> AwesomeHeader1=boolean
>>>>> AwesomeHeader2=long
>>>>> JMSCorrelationId=String
>>>>> JMSXGroupId=String
>>>>> What you think?
>>>>> Cheers
>>>>> Mike
>>>>> Sent from my iPhone
>>>>>> On 2 May 2017, at 18:45, Ewen Cheslack-Postava <e...@confluent.io>
>>>>> wrote:
>>>>>> A couple of thoughts:
>>>>>> First, agreed that we definitely want to expose header functionality.
>>>>> Thank
>>>>>> you Mike for starting the conversation! Even if Connect doesn't do
>>>>> anything
>>>>>> special with it, there's value in being able to access/set headers.
>>>>>> On motivation -- I think there are much broader use cases. When
>>> thinking
>>>>>> about exposing headers, I'd actually use Replicator as only a minor
>>>>>> supporting case. The reason is that it is a very uncommon case where
>>>>> there
>>>>>> is zero impedance mismatch between the source and sink of the data
>>> since
>>>>>> they are both Kafka. This means you don't need to think much about data
>>>>>> formats/serialization. I think the JMS use case is a better example
>>> since
>>>>>> JMS headers and Kafka headers don't quite match up. Here's a quick list
>>>>> of
>>>>>> use cases I can think of off the top of my head:
>>>>>> 1. Include headers from other systems that support them: JMS (or really
>>>>> any
>>>>>> MQ), HTTP
>>>>>> 2. Other connector-specific headers. For example, from JDBC maybe the
>>>>> table
>>>>>> the data comes from is a header; for a CDC connector you might include
>>>>> the
>>>>>> binlog offset as a header.
>>>>>> 3. Interceptor/SMT-style use cases for annotating things like
>>> provenance
>>>>> of
>>>>>> data:
>>>>>> 3a. Generically w/ user-supplied data like data center, host, app ID,
>>>>> etc.
>>>>>> 3b. Kafka Connect framework level info, such as the connector/task
>>>>>> generating the data
>>>>>> On deviation from Connect's model -- to be honest, the KIP-82 also
>>>>> deviates
>>>>>> quite substantially from how Kafka handles data already, so we may
>>>>> struggle
>>>>>> a bit to rectify the two. (In particular, headers specify some
>>> structure
>>>>>> and enforce strings specifically for header keys, but then require you
>>> to
>>>>>> do serialization of header values yourself...).
>>>>>> I think the use cases I mentioned above may also need different
>>>>> approaches
>>>>>> to how the data in headers are handled. As Gwen mentions, if we expose
>>>>> the
>>>>>> headers to Connectors, they need to have some idea of the format and
>>> the
>>>>>> reason for byte[] values in KIP-82 is to leave that decision up to the
>>>>>> organization using them. But without knowing the format, connectors
>>> can't
>>>>>> really do anything with them -- if a source connector assumes a format,
>>>>>> they may generate data incompatible with the format used by the rest of
>>>>> the
>>>>>> organization. On the other hand, I have a feeling most people will just
>>>>> use
>>>>>> <String, String> headers, so allowing connectors to embed arbitrarily
>>>>>> complex data may not work out well in practice. Or maybe we leave it
>>>>>> flexible, most people default to using StringConverter for the
>>> serializer
>>>>>> and Connectors will end up defaulting to that just for compatibility...
>>>>>> I'm not sure I have a real proposal yet, but I do think understanding
>>> the
>>>>>> impact of using a Converter for headers would be useful, and we might
>>>>> want
>>>>>> to think about how this KIP would fit in with transformations (or if
>>> that
>>>>>> is something that can be deferred, handled separately from the existing
>>>>>> transformations, etc).
>>>>>> -Ewen
>>>>>> On Mon, May 1, 2017 at 11:52 AM, Michael Pearce <michael.pea...@ig.com
>>>>>> wrote:
>>>>>>> Hi Gwen,
>>>>>>> Then intent here was to allow tools that perform similar role to
>>> mirror
>>>>>>> makers of replicating the messaging from one cluster to another.  Eg
>>>>> like
>>>>>>> mirror make should just be taking and transferring the headers as is.
>>>>>>> We don't actually use this inside our company, so not exposing this
>>>>> isn't
>>>>>>> an issue for us. Just believe there are companies like confluent who
>>>>> have
>>>>>>> tools like replicator that do.
>>>>>>> And as good citizens think we should complete the work and expose the
>>>>>>> headers same as in the record to at least allow them to replicate the
>>>>>>> messages as is. Note Steph seems to want it.
>>>>>>> Cheers
>>>>>>> Mike
>>>>>>> Sent using OWA for iPhone
>>>>>>> ________________________________________
>>>>>>> From: Gwen Shapira <g...@confluent.io>
>>>>>>> Sent: Monday, May 1, 2017 2:36:34 PM
>>>>>>> To: dev@kafka.apache.org
>>>>>>> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka
>>> Connect
>>>>>>> Hi,
>>>>>>> I'm excited to see the community expanding Connect in this direction!
>>>>>>> Headers + Transforms == Fun message routing.
>>>>>>> I like how clean the proposal is, but I'm concerned that it kinda
>>>>> deviates
>>>>>>> from how Connect handles data elsewhere.
>>>>>>> Unlike Kafka, Connect doesn't look at all data as byte-arrays, we have
>>>>>>> converters that take data in specific formats (JSON, Avro) and turns
>>> it
>>>>>>> into Connect data types (defined in the data api). I think it will be
>>>>> more
>>>>>>> consistent for connector developers to also get headers as some kind
>>> of
>>>>>>> structured or semi-structured data (and to expand the converters to
>>>>> handle
>>>>>>> header conversions as well).
>>>>>>> This will allow for Connect's separation of concerns - Connector
>>>>> developers
>>>>>>> don't worry about data formats (because they get the internal connect
>>>>>>> objects) and Converters do all the data format work.
>>>>>>> Another thing, in my experience, APIs work better if they are put into
>>>>> use
>>>>>>> almost immediately - so difficulties in using the APIs are immediately
>>>>>>> surfaced. Are you planning any connectors that will use this feature
>>>>> (not
>>>>>>> necessarily in Kafka, just in general)? Or perhaps we can think of a
>>>>> way to
>>>>>>> expand Kafka's file connectors so they'll use headers somehow (can't
>>>>> think
>>>>>>> of anything, but maybe?).
>>>>>>> Gwen
>>>>>>> On Sat, Apr 29, 2017 at 12:12 AM, Michael Pearce <
>>> michael.pea...@ig.com
>>>>>>> wrote:
>>>>>>>> Hi All,
>>>>>>>> Now KIP-82 is committed I would like to discuss extending the work to
>>>>>>>> expose it in Kafka Connect, its primary focus being so connectors
>>> that
>>>>>>> may
>>>>>>>> do similar tasks as MirrorMakers, either Kafka->Kafka or JMS-Kafka
>>>>> would
>>>>>>> be
>>>>>>>> able to replicate the headers.
>>>>>>>> It would be ideal but not mandatory for this to go in 0.11 release so
>>>>> is
>>>>>>>> available on day one of headers being available.
>>>>>>>> Please find the KIP here:
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>> 145+-+Expose+Record+Headers+in+Kafka+Connect
>>>>>>>> Please find an initial implementation as a PR here:
>>>>>>>> https://github.com/apache/kafka/pull/2942
>>>>>>>> Kind Regards
>>>>>>>> Mike
>>>>>>>> The information contained in this email is strictly confidential and
>>>>> for
>>>>>>>> the use of the addressee only, unless otherwise indicated. If you are
>>>>> not
>>>>>>>> the intended recipient, please do not read, copy, use or disclose to
>>>>>>> others
>>>>>>>> this message or any attachment. Please also notify the sender by
>>>>> replying
>>>>>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>>>>>>> email
>>>>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate
>>> to
>>>>>>> the
>>>>>>>> official business of this company shall be understood as neither
>>> given
>>>>>>> nor
>>>>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>>>>>> registered in England and Wales, company number 04008957) and IG
>>> Index
>>>>>>>> Limited (a company registered in England and Wales, company number
>>>>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate
>>> Hill,
>>>>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
>>>>> IG
>>>>>>>> Index Limited (register number 114059) are authorised and regulated
>>> by
>>>>>>> the
>>>>>>>> Financial Conduct Authority.
>>>>>>> --
>>>>>>> *Gwen Shapira*
>>>>>>> Product Manager | Confluent
>>>>>>> 650.450.2760 | @gwenshap
>>>>>>> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>>>>>>> <http://www.confluent.io/blog>
>>>>>>> The information contained in this email is strictly confidential and
>>> for
>>>>>>> the use of the addressee only, unless otherwise indicated. If you are
>>>>> not
>>>>>>> the intended recipient, please do not read, copy, use or disclose to
>>>>> others
>>>>>>> this message or any attachment. Please also notify the sender by
>>>>> replying
>>>>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>>>>> email
>>>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to
>>>>> the
>>>>>>> official business of this company shall be understood as neither given
>>>>> nor
>>>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>>>>> registered in England and Wales, company number 04008957) and IG Index
>>>>>>> Limited (a company registered in England and Wales, company number
>>>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
>>> IG
>>>>>>> Index Limited (register number 114059) are authorised and regulated by
>>>>> the
>>>>>>> Financial Conduct Authority.
>>>> The information contained in this email is strictly confidential and for
>>> the use of the addressee only, unless otherwise indicated. If you are not
>>> the intended recipient, please do not read, copy, use or disclose to others
>>> this message or any attachment. Please also notify the sender by replying
>>> to this email or by telephone (+44(020 7896 0011) and then delete the email
>>> and any copies of it. Opinions, conclusion (etc) that do not relate to the
>>> official business of this company shall be understood as neither given nor
>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>> registered in England and Wales, company number 04008957) and IG Index
>>> Limited (a company registered in England and Wales, company number
>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>>> Index Limited (register number 114059) are authorised and regulated by the
>>> Financial Conduct Authority.

Reply via email to