We still have a few blockers to fix in 0.8.2. When that's done, we can
discuss whether to do another 0.8.2 beta or just do the 0.8.2 final release.
Thanks,
Jun
On Wed, Dec 17, 2014 at 5:29 PM, Shannon Lloyd wrote:
>
> Are you guys planning another beta for everyone to try out the changes
> befo
Are you guys planning another beta for everyone to try out the changes
before you cut 0.8.2 final?
Cheers,
Shannon
On 18 December 2014 at 11:24, Rajiv Kurian wrote:
>
> Has the mvn repo been updated too?
>
> On Wed, Dec 17, 2014 at 4:31 PM, Jun Rao wrote:
> >
> > Thanks everyone for the feedbac
Has the mvn repo been updated too?
On Wed, Dec 17, 2014 at 4:31 PM, Jun Rao wrote:
>
> Thanks everyone for the feedback and the discussion. The proposed changes
> have been checked into both 0.8.2 and trunk.
>
> Jun
>
> On Tue, Dec 16, 2014 at 10:43 PM, Joel Koshy wrote:
> >
> > Jun,
> >
> > Tha
Thanks everyone for the feedback and the discussion. The proposed changes
have been checked into both 0.8.2 and trunk.
Jun
On Tue, Dec 16, 2014 at 10:43 PM, Joel Koshy wrote:
>
> Jun,
>
> Thanks for summarizing this - it helps confirm for me that I did not
> misunderstand anything in this thread
Jun,
Thanks for summarizing this - it helps confirm for me that I did not
misunderstand anything in this thread so far; and that I disagree with
the premise that the steps in using the current byte-oriented API is
cumbersome or inflexible. It involves instantiating the K-V
serializers in code (as
Joel,
With a byte array interface, of course there is nothing that one can't do.
However, the real question is that whether we want to encourage people to
use it this way or not. Being able to flow just bytes is definitely easier
to get started. That's why many early adopters choose to do it that
Documentation is inevitable even if the serializer/deserializer is
part of the API - since the user has to set it up in the configs. So
again, you can only encourage people to use it through documentation.
The simpler byte-oriented API seems clearer to me because anyone who
needs to send (or receiv
Joel,
It's just that if the serializer/deserializer is not part of the API, you
can only encourage people to use it through documentation. However, not
everyone will read the documentation if it's not directly used in the API.
Thanks,
Jun
On Mon, Dec 15, 2014 at 2:11 AM, Joel Koshy wrote:
> (
(sorry about the late follow-up late - I'm traveling most of this
month)
I'm likely missing something obvious, but I find the following to be a
somewhat vague point that has been mentioned more than once in this
thread without a clear explanation. i.e., why is it hard to share a
serializer/deseria
Thanks Jun.
I think we all understand the motivation of adding serialization API back,
but are just proposing different ways of doing such. I personally prefer to
not bind the producer instance with a fixed serialization, but that said I
am fine with the current proposal too as this can still be d
Hi All,
This is very likely when you have large site such as Linked-in and you have
thousand of servers producing data. You will mixed bag of producer and
serialization or deserialization because of incremental code deployment.
So, it is best to keep the API as generic as possible and each org /
> In practice the cases that actually mix serialization types in a single
stream are pretty rare I think just because the consumer then has the
problem of guessing how to deserialize, so most of these will end up with
at least some marker or schema id or whatever that tells you how to read
the data
Thank you Jay. I agree with the issue that you point w.r.t paired
serializers. I also think having mix serialization types is rare. To get
the current behavior, one can simply use a ByteArraySerializer. This is
best understood by talking with many customers and you seem to have done
that. I am conv
Ok, based on all the feedbacks that we have heard, I plan to do the
following.
1. Keep the generic api in KAFKA-1797.
2. Add a new constructor in Producer/Consumer that takes the key and the
value serializer instance.
3. Have KAFKA-1797 reviewed and checked into 0.8.2 and trunk.
This will make it
Hi Jun,
Thanks for pointing out this. Yes, putting serialization/deserialization
code into record does lose some flexibility. Some more thinking, I think
no matter what we do to bind the producer and serializer/deserializer, we
can always to the same thing on Record, i.e. We can also have some
con
Hey Sriram,
Thanks! I think this is a very helpful summary.
Let me try to address your point about passing in the serde at send time.
I think the first objection is really to the paired key/value serializer
interfaces. This leads to kind of a weird combinatorial thing where you
would have an avr
This thread has diverged multiple times now and it would be worth
summarizing them.
There seems to be the following points of discussion -
1. Can we keep the serialization semantics outside the Producer interface
and have simple bytes in / bytes out for the interface (This is what we
have today)
Jiangjie,
The issue with adding the serializer in ProducerRecord is that you need to
implement all combinations of serializers for key and value. So, instead of
just implementing int and string serializers, you will have to implement
all 4 combinations.
Adding a new producer constructor like Prod
Hey Guozhang,
These are good points, let me try to address them.
1. Our goal is to be able to provide a best-of-breed serialization package
that works out of the box that does most of the magic. This best-of-breed
plugin would allow schemas, schema evolution, compatibility checks, etc. We
think i
I agree that having the new Producer(KeySerializer,
ValueSerializer) interface would be useful.
People suggested cases where you want to mix and match serialization types.
The ByteArraySerializer is a no-op that would give the current behavior so
any odd case where you need to mix and match serial
I'm just thinking instead of binding serialization with producer, another
option is to bind serializer/deserializer with
ProducerRecord/ConsumerRecord (please see the detail proposal below.)
The arguments for this option is:
A. A single producer could send different message type
Can you elaborate a bit on what an object API wrapper will look like?
Since the serialization API already exists today, its very easy to
know how I'll use the new producer with serialization - exactly the
same way I use the existing one.
If we are proposing a change that will require significant c
I would prefer making the kafka producer as is and wrap the object API on
top rather than wiring the serializer configs into producers. Some thoughts:
1. For code sharing, I think it may only be effective for though simple
functions such as string serialization, etc. For Avro / Shrift / PB, the
se
Jan, Jason,
First, within an Kafka cluster, it's unlikely that each topic has a
different type serializer. Like Jason mentioned, Square standardizes on
protocol. Many other places such as LinkedIn standardize on Avro.
Second, dealing with bytes only has limited use cases. Other than copying
bytes
Sorry for adding noise, but I think Jan has a very good point: applications
shouldn't be forced to create multiple producers simply to wire-in the
proper Serializer. It's an artificial restriction that wastes resources.
It's a common thing for us to create a single producer and slap different
"vie
In our case, we use protocol buffers for all messages, and these have
simple serialization/deserialization builtin to the protobuf libraries
(e.g. MyProtobufMessage.toByteArray()). Also, we often produce/consume
messages without conversion to/from protobuf Objects (e.g. in cases where
we are just
Hello Everyone,
I would very much appreciate if someone could provide me a real world
examplewhere it is more convenient to implement the serializers instead
of just making sure to provide bytearrays.
The code we came up with explicitly avoids the serializer api. I think
it is common underst
Yeah I am kind of sad about that :(. I just mentioned it to show that there
are material use cases for applications where you expose the underlying
ByteBuffer (I know we were talking about byte arrays) instead of
serializing/deserializing objects - performance is a big one.
On Tue, Dec 2, 2014 a
Rajiv,
That's probably a very special use case. Note that even in the new consumer
api w/o the generics, the client is only going to get the byte array back.
So, you won't be able to take advantage of reusing the ByteBuffer in the
underlying responses.
Thanks,
Jun
On Tue, Dec 2, 2014 at 5:26 PM
I for one use the consumer (Simple Consumer) without any deserialization. I
just take the ByteBuffer wrap it a preallocated flyweight and use it
without creating any objects. I'd ideally not have to wrap this logic in a
deserializer interface. For every one who does do this, it seems like a
very sm
> For (1), yes, but it's easier to make a config change than a code change.
> If you are using a third party library, one may not be able to make any
> code change.
Doesn't that assume that all organizations have to already share the
same underlying specific data type definition (e.g.,
UniversalAv
Rajiv,
Yes, that's possible within an organization. However, if you want to share
that implementation with other organizations, they will have to make code
changes, instead of just a config change.
Thanks,
Jun
On Tue, Dec 2, 2014 at 1:06 PM, Rajiv Kurian wrote:
> Why can't the organization pa
For (1), yes, but it's easier to make a config change than a code change.
If you are using a third party library, one may not be able to make any
code change.
For (2), it's just that if most consumers always do deserialization after
getting the raw bytes, perhaps it would be better to have these t
"It also makes it possible to do validation on the server
side or make other tools that inspect or display messages (e.g. the various
command line tools) and do this in an easily pluggable way across tools."
I agree that it's valuable to have a standard way to plugin serialization
across many tool
> The issue with a separate ser/deser library is that if it's not part of the
> client API, (1) users may not use it or (2) different users may use it in
> different ways. For example, you can imagine that two Avro implementations
> have different ways of instantiation (since it's not enforced by t
Yeah totally, far from preventing it, making it easy to specify/encourage a
custom serializer across your org is exactly the kind of thing I was hoping
to make work well. If there is a config that gives the serializer you can
just default this to what you want people to use as some kind of
environm
Why can't the organization package the Avro implementation with a kafka
client and distribute that library though? The risk of different users
supplying the kafka client with different serializer/deserializer
implementations still exists.
On Tue, Dec 2, 2014 at 12:11 PM, Jun Rao wrote:
> Joel, R
Joel, Rajiv, Thunder,
The issue with a separate ser/deser library is that if it's not part of the
client API, (1) users may not use it or (2) different users may use it in
different ways. For example, you can imagine that two Avro implementations
have different ways of instantiation (since it's no
Thanks for the follow-up Jay. I still don't quite see the issue here
but maybe I just need to process this a bit more. To me "packaging up
the best practice and plug it in" seems to be to expose a simple
low-level API and give people the option to plug in a (possibly
shared) standard serializer in
er in a consistent way, and *that* still needs to be documented and
understood.
Regards,
Thunder
-Original Message-
From: Jay Kreps [mailto:j...@confluent.io]
Sent: Tuesday, December 02, 2014 11:10 AM
To: d...@kafka.apache.org
Cc: users@kafka.apache.org
Subject: Re: [DISCUSSION] adding
Hey Joel, you are right, we discussed this, but I think we didn't think
about it as deeply as we should have. I think our take was strongly shaped
by having a wrapper api at LinkedIn that DOES do the serialization
transparently so I think you are thinking of the producer as just an
implementation d
@kafka.apache.org
Subject: Re: [DISCUSSION] adding the serializer api back to the new java
producer
It's not clear to me from your initial email what exactly can't be done with
the raw accept bytes API. Serialization libraries should be share able outside
of kafka. I honestly like the
It's not clear to me from your initial email what exactly can't be done
with the raw accept bytes API. Serialization libraries should be share able
outside of kafka. I honestly like the simplicity of the raw bytes API and
feel like serialization should just remain outside of the base Kafka APIs.
An
Re: pushing complexity of dealing with objects: we're talking about
just a call to a serialize method to convert the object to a byte
array right? Or is there more to it? (To me) that seems less
cumbersome than having to interact with parameterized types. Actually,
can you explain more clearly what
Joel,
Thanks for the feedback.
Yes, the raw bytes interface is simpler than the Generic api. However, it
just pushes the complexity of dealing with the objects to the application.
We also thought about the layered approach. However, this may confuse the
users since there is no single entry point
> makes it hard to reason about what type of data is being sent to Kafka and
> also makes it hard to share an implementation of the serializer. For
> example, to support Avro, the serialization logic could be quite involved
> since it might need to register the Avro schema in some remote registry a
Yes, that will be a separate release. Possibly an 0.8.2 beta-2, followed by
0.8.2 final.
Thanks,
Jun
On Wed, Nov 26, 2014 at 2:24 AM, Shlomi Hazan wrote:
> Jay, Jun,
> Thank you both for explaining. I understand this is important enough such
> that it must be done, and if so, the sooner the be
Jay, Jun,
Thank you both for explaining. I understand this is important enough such
that it must be done, and if so, the sooner the better.
How will the change be released? a beta-2 or release candidate? I think
that if possible, it should not overrun the already released version.
Thank you guys fo
Bhavesh,
This api change doesn't mean you need to change the format of the encoded
data. It simply moves the serialization logic from the application to a
pluggable serializer. As long as you preserve the serialization logic, the
consumer should still see the same bytes.
If you are talking about
How will mix bag will work with Consumer side ? Entire site can not be
rolled at once so Consumer will have to deals with New and Old Serialize
Bytes ? This could be app team responsibility. Are you guys targeting
0.8.2 release, which may break customer who are already using new producer
API (be
+1 for this change.
what about de-serializer class in 0.8.2? Say i am using new producer with
Avro and old consumer combination.
then i need to give custom Decoder implementation for Avro right?.
On Tue, Nov 25, 2014 at 9:19 PM, Joe Stein wrote:
> The serializer is an expected use of the prod
The serializer is an expected use of the producer/consumer now and think we
should continue that support in the new client. As far as breaking the API
it is why we released the 0.8.2-beta to help get through just these type of
blocking issues in a way that the community at large could be involved i
+1 on this change — APIs are forever. As much as we’d love to see 0.8.2 release
ASAP, it is important to get this right.
-JW
> On Nov 24, 2014, at 5:58 PM, Jun Rao wrote:
>
> Hi, Everyone,
>
> I'd like to start a discussion on whether it makes sense to add the
> serializer api back to the new
53 matches
Mail list logo