Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-17 Thread Jun Rao
We still have a few blockers to fix in 0.8.2. When that's done, we can discuss whether to do another 0.8.2 beta or just do the 0.8.2 final release. Thanks, Jun On Wed, Dec 17, 2014 at 5:29 PM, Shannon Lloyd wrote: > > Are you guys planning another beta for everyone to try out the changes > befo

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-17 Thread Shannon Lloyd
Are you guys planning another beta for everyone to try out the changes before you cut 0.8.2 final? Cheers, Shannon On 18 December 2014 at 11:24, Rajiv Kurian wrote: > > Has the mvn repo been updated too? > > On Wed, Dec 17, 2014 at 4:31 PM, Jun Rao wrote: > > > > Thanks everyone for the feedbac

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-17 Thread Rajiv Kurian
Has the mvn repo been updated too? On Wed, Dec 17, 2014 at 4:31 PM, Jun Rao wrote: > > Thanks everyone for the feedback and the discussion. The proposed changes > have been checked into both 0.8.2 and trunk. > > Jun > > On Tue, Dec 16, 2014 at 10:43 PM, Joel Koshy wrote: > > > > Jun, > > > > Tha

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-17 Thread Jun Rao
Thanks everyone for the feedback and the discussion. The proposed changes have been checked into both 0.8.2 and trunk. Jun On Tue, Dec 16, 2014 at 10:43 PM, Joel Koshy wrote: > > Jun, > > Thanks for summarizing this - it helps confirm for me that I did not > misunderstand anything in this thread

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-16 Thread Joel Koshy
Jun, Thanks for summarizing this - it helps confirm for me that I did not misunderstand anything in this thread so far; and that I disagree with the premise that the steps in using the current byte-oriented API is cumbersome or inflexible. It involves instantiating the K-V serializers in code (as

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-16 Thread Jun Rao
Joel, With a byte array interface, of course there is nothing that one can't do. However, the real question is that whether we want to encourage people to use it this way or not. Being able to flow just bytes is definitely easier to get started. That's why many early adopters choose to do it that

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-15 Thread Joel Koshy
Documentation is inevitable even if the serializer/deserializer is part of the API - since the user has to set it up in the configs. So again, you can only encourage people to use it through documentation. The simpler byte-oriented API seems clearer to me because anyone who needs to send (or receiv

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-15 Thread Jun Rao
Joel, It's just that if the serializer/deserializer is not part of the API, you can only encourage people to use it through documentation. However, not everyone will read the documentation if it's not directly used in the API. Thanks, Jun On Mon, Dec 15, 2014 at 2:11 AM, Joel Koshy wrote: > (

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-15 Thread Joel Koshy
(sorry about the late follow-up late - I'm traveling most of this month) I'm likely missing something obvious, but I find the following to be a somewhat vague point that has been mentioned more than once in this thread without a clear explanation. i.e., why is it hard to share a serializer/deseria

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-11 Thread Guozhang Wang
Thanks Jun. I think we all understand the motivation of adding serialization API back, but are just proposing different ways of doing such. I personally prefer to not bind the producer instance with a fixed serialization, but that said I am fine with the current proposal too as this can still be d

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-09 Thread Bhavesh Mistry
Hi All, This is very likely when you have large site such as Linked-in and you have thousand of servers producing data. You will mixed bag of producer and serialization or deserialization because of incremental code deployment. So, it is best to keep the API as generic as possible and each org /

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-09 Thread Steven Wu
> In practice the cases that actually mix serialization types in a single stream are pretty rare I think just because the consumer then has the problem of guessing how to deserialize, so most of these will end up with at least some marker or schema id or whatever that tells you how to read the data

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-08 Thread Sriram Subramanian
Thank you Jay. I agree with the issue that you point w.r.t paired serializers. I also think having mix serialization types is rare. To get the current behavior, one can simply use a ByteArraySerializer. This is best understood by talking with many customers and you seem to have done that. I am conv

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-08 Thread Jun Rao
Ok, based on all the feedbacks that we have heard, I plan to do the following. 1. Keep the generic api in KAFKA-1797. 2. Add a new constructor in Producer/Consumer that takes the key and the value serializer instance. 3. Have KAFKA-1797 reviewed and checked into 0.8.2 and trunk. This will make it

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Jiangjie Qin
Hi Jun, Thanks for pointing out this. Yes, putting serialization/deserialization code into record does lose some flexibility. Some more thinking, I think no matter what we do to bind the producer and serializer/deserializer, we can always to the same thing on Record, i.e. We can also have some con

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Jay Kreps
Hey Sriram, Thanks! I think this is a very helpful summary. Let me try to address your point about passing in the serde at send time. I think the first objection is really to the paired key/value serializer interfaces. This leads to kind of a weird combinatorial thing where you would have an avr

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Sriram Subramanian
This thread has diverged multiple times now and it would be worth summarizing them. There seems to be the following points of discussion - 1. Can we keep the serialization semantics outside the Producer interface and have simple bytes in / bytes out for the interface (This is what we have today)

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Jun Rao
Jiangjie, The issue with adding the serializer in ProducerRecord is that you need to implement all combinations of serializers for key and value. So, instead of just implementing int and string serializers, you will have to implement all 4 combinations. Adding a new producer constructor like Prod

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jay Kreps
Hey Guozhang, These are good points, let me try to address them. 1. Our goal is to be able to provide a best-of-breed serialization package that works out of the box that does most of the magic. This best-of-breed plugin would allow schemas, schema evolution, compatibility checks, etc. We think i

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jay Kreps
I agree that having the new Producer(KeySerializer, ValueSerializer) interface would be useful. People suggested cases where you want to mix and match serialization types. The ByteArraySerializer is a no-op that would give the current behavior so any odd case where you need to mix and match serial

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jiangjie Qin
I'm just thinking instead of binding serialization with producer, another option is to bind serializer/deserializer with ProducerRecord/ConsumerRecord (please see the detail proposal below.) The arguments for this option is: A. A single producer could send different message type

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Gwen Shapira
Can you elaborate a bit on what an object API wrapper will look like? Since the serialization API already exists today, its very easy to know how I'll use the new producer with serialization - exactly the same way I use the existing one. If we are proposing a change that will require significant c

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Guozhang Wang
I would prefer making the kafka producer as is and wrap the object API on top rather than wiring the serializer configs into producers. Some thoughts: 1. For code sharing, I think it may only be effective for though simple functions such as string serialization, etc. For Avro / Shrift / PB, the se

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jun Rao
Jan, Jason, First, within an Kafka cluster, it's unlikely that each topic has a different type serializer. Like Jason mentioned, Square standardizes on protocol. Many other places such as LinkedIn standardize on Avro. Second, dealing with bytes only has limited use cases. Other than copying bytes

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Philippe Laflamme
Sorry for adding noise, but I think Jan has a very good point: applications shouldn't be forced to create multiple producers simply to wire-in the proper Serializer. It's an artificial restriction that wastes resources. It's a common thing for us to create a single producer and slap different "vie

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jason Rosenberg
In our case, we use protocol buffers for all messages, and these have simple serialization/deserialization builtin to the protobuf libraries (e.g. MyProtobufMessage.toByteArray()). Also, we often produce/consume messages without conversion to/from protobuf Objects (e.g. in cases where we are just

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jan Filipiak
Hello Everyone, I would very much appreciate if someone could provide me a real world examplewhere it is more convenient to implement the serializers instead of just making sure to provide bytearrays. The code we came up with explicitly avoids the serializer api. I think it is common underst

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Rajiv Kurian
Yeah I am kind of sad about that :(. I just mentioned it to show that there are material use cases for applications where you expose the underlying ByteBuffer (I know we were talking about byte arrays) instead of serializing/deserializing objects - performance is a big one. On Tue, Dec 2, 2014 a

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jun Rao
Rajiv, That's probably a very special use case. Note that even in the new consumer api w/o the generics, the client is only going to get the byte array back. So, you won't be able to take advantage of reusing the ByteBuffer in the underlying responses. Thanks, Jun On Tue, Dec 2, 2014 at 5:26 PM

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Rajiv Kurian
I for one use the consumer (Simple Consumer) without any deserialization. I just take the ByteBuffer wrap it a preallocated flyweight and use it without creating any objects. I'd ideally not have to wrap this logic in a deserializer interface. For every one who does do this, it seems like a very sm

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Joel Koshy
> For (1), yes, but it's easier to make a config change than a code change. > If you are using a third party library, one may not be able to make any > code change. Doesn't that assume that all organizations have to already share the same underlying specific data type definition (e.g., UniversalAv

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jun Rao
Rajiv, Yes, that's possible within an organization. However, if you want to share that implementation with other organizations, they will have to make code changes, instead of just a config change. Thanks, Jun On Tue, Dec 2, 2014 at 1:06 PM, Rajiv Kurian wrote: > Why can't the organization pa

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jun Rao
For (1), yes, but it's easier to make a config change than a code change. If you are using a third party library, one may not be able to make any code change. For (2), it's just that if most consumers always do deserialization after getting the raw bytes, perhaps it would be better to have these t

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Roger Hoover
"It also makes it possible to do validation on the server side or make other tools that inspect or display messages (e.g. the various command line tools) and do this in an easily pluggable way across tools." I agree that it's valuable to have a standard way to plugin serialization across many tool

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Joel Koshy
> The issue with a separate ser/deser library is that if it's not part of the > client API, (1) users may not use it or (2) different users may use it in > different ways. For example, you can imagine that two Avro implementations > have different ways of instantiation (since it's not enforced by t

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jay Kreps
Yeah totally, far from preventing it, making it easy to specify/encourage a custom serializer across your org is exactly the kind of thing I was hoping to make work well. If there is a config that gives the serializer you can just default this to what you want people to use as some kind of environm

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Rajiv Kurian
Why can't the organization package the Avro implementation with a kafka client and distribute that library though? The risk of different users supplying the kafka client with different serializer/deserializer implementations still exists. On Tue, Dec 2, 2014 at 12:11 PM, Jun Rao wrote: > Joel, R

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jun Rao
Joel, Rajiv, Thunder, The issue with a separate ser/deser library is that if it's not part of the client API, (1) users may not use it or (2) different users may use it in different ways. For example, you can imagine that two Avro implementations have different ways of instantiation (since it's no

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Joel Koshy
Thanks for the follow-up Jay. I still don't quite see the issue here but maybe I just need to process this a bit more. To me "packaging up the best practice and plug it in" seems to be to expose a simple low-level API and give people the option to plug in a (possibly shared) standard serializer in

RE: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Thunder Stumpges
er in a consistent way, and *that* still needs to be documented and understood. Regards, Thunder -Original Message- From: Jay Kreps [mailto:j...@confluent.io] Sent: Tuesday, December 02, 2014 11:10 AM To: d...@kafka.apache.org Cc: users@kafka.apache.org Subject: Re: [DISCUSSION] adding

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jay Kreps
Hey Joel, you are right, we discussed this, but I think we didn't think about it as deeply as we should have. I think our take was strongly shaped by having a wrapper api at LinkedIn that DOES do the serialization transparently so I think you are thinking of the producer as just an implementation d

RE: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Thunder Stumpges
@kafka.apache.org Subject: Re: [DISCUSSION] adding the serializer api back to the new java producer It's not clear to me from your initial email what exactly can't be done with the raw accept bytes API. Serialization libraries should be share able outside of kafka. I honestly like the

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Rajiv Kurian
It's not clear to me from your initial email what exactly can't be done with the raw accept bytes API. Serialization libraries should be share able outside of kafka. I honestly like the simplicity of the raw bytes API and feel like serialization should just remain outside of the base Kafka APIs. An

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Joel Koshy
Re: pushing complexity of dealing with objects: we're talking about just a call to a serialize method to convert the object to a byte array right? Or is there more to it? (To me) that seems less cumbersome than having to interact with parameterized types. Actually, can you explain more clearly what

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jun Rao
Joel, Thanks for the feedback. Yes, the raw bytes interface is simpler than the Generic api. However, it just pushes the complexity of dealing with the objects to the application. We also thought about the layered approach. However, this may confuse the users since there is no single entry point

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Joel Koshy
> makes it hard to reason about what type of data is being sent to Kafka and > also makes it hard to share an implementation of the serializer. For > example, to support Avro, the serialization logic could be quite involved > since it might need to register the Avro schema in some remote registry a

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-26 Thread Jun Rao
Yes, that will be a separate release. Possibly an 0.8.2 beta-2, followed by 0.8.2 final. Thanks, Jun On Wed, Nov 26, 2014 at 2:24 AM, Shlomi Hazan wrote: > Jay, Jun, > Thank you both for explaining. I understand this is important enough such > that it must be done, and if so, the sooner the be

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-26 Thread Shlomi Hazan
Jay, Jun, Thank you both for explaining. I understand this is important enough such that it must be done, and if so, the sooner the better. How will the change be released? a beta-2 or release candidate? I think that if possible, it should not overrun the already released version. Thank you guys fo

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-25 Thread Jun Rao
Bhavesh, This api change doesn't mean you need to change the format of the encoded data. It simply moves the serialization logic from the application to a pluggable serializer. As long as you preserve the serialization logic, the consumer should still see the same bytes. If you are talking about

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-25 Thread Bhavesh Mistry
How will mix bag will work with Consumer side ? Entire site can not be rolled at once so Consumer will have to deals with New and Old Serialize Bytes ? This could be app team responsibility. Are you guys targeting 0.8.2 release, which may break customer who are already using new producer API (be

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-25 Thread Manikumar Reddy
+1 for this change. what about de-serializer class in 0.8.2? Say i am using new producer with Avro and old consumer combination. then i need to give custom Decoder implementation for Avro right?. On Tue, Nov 25, 2014 at 9:19 PM, Joe Stein wrote: > The serializer is an expected use of the prod

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-25 Thread Joe Stein
The serializer is an expected use of the producer/consumer now and think we should continue that support in the new client. As far as breaking the API it is why we released the 0.8.2-beta to help get through just these type of blocking issues in a way that the community at large could be involved i

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-11-25 Thread Jonathan Weeks
+1 on this change — APIs are forever. As much as we’d love to see 0.8.2 release ASAP, it is important to get this right. -JW > On Nov 24, 2014, at 5:58 PM, Jun Rao wrote: > > Hi, Everyone, > > I'd like to start a discussion on whether it makes sense to add the > serializer api back to the new