Re: Proposed Changes To New Producer Public API

2014-02-03 Thread Joel Koshy
For (3) we could also do the following: - On any retryable producer error, force a metadata refresh (in handleProducerResponse). - In handleMetadataResponse, the producer can (internally) close out connections that are no longer valid. (i.e., connections to {old set of leader brokers} - {new

Re: Proposed Changes To New Producer Public API

2014-02-03 Thread Jay Kreps
1. Yes, I think the name could be improved. However that name doesn't really express what it does. What about RecordSentCallback? 2. As Neha said the nodes are initialized to a random position. Round robin is preferable to random (lower variance, cheaper to compute, etc). Your point about skipping

Re: Proposed Changes To New Producer Public API

2014-02-03 Thread Neha Narkhede
>> (1) if the replication factor is 1 and there is a broker failure, we can still route the message, (2) if a bunch of producers are started at the same time, this prevents them from picking up the same partition in a synchronized way. You raised a good point, Jun. Regarding #1, shouldn't round ro

Re: Proposed Changes To New Producer Public API

2014-02-03 Thread Jun Rao
Fine with most of the changes. Just a few questions/comments. 1. Would it be better to change Callback to ProducerCallback to distinguish it from controller callback and potential future other types of callbacks (e.g. consumer)? 2. If no key is present AND no partition id is present, partitions w

Re: Proposed Changes To New Producer Public API

2014-02-03 Thread Jay Kreps
Also, I thought a bit more about the org.apache thing. This is clearly an aesthetic thing, but I do feel aesthetics are important. The arguments for the kafka.* package is basically that putting your domain name in the namespace is a silly cargo cult thing that Java people do. I have never seen a

Re: Proposed Changes To New Producer Public API

2014-02-02 Thread Jay Kreps
Okay I posted a patch against trunk that carries out the refactoring described above: https://issues.apache.org/jira/browse/KAFKA-1227 Updated javadoc is here: http://empathybox.com/kafka-javadoc This touches a fair number of files as I also improved documentation and standardized terminology in

Re: Proposed Changes To New Producer Public API

2014-02-02 Thread Guozhang Wang
About serialization, I am wondering if most people would try to have a customized partioning mainly due to logic motivations, such that they want some messages to go to the same partition; if that is true, they do not really care about the actually partition id but all they would do is specify the

Re: Proposed Changes To New Producer Public API

2014-02-02 Thread Neha Narkhede
It *is* expected that the user call this method on every single call and because we no longer control the partitioner interface there will be no way to control this. Make sense. This will ensure that new partitions are detected as defined by topic.metadata.refresh.interval.ms. There are a quite

Re: Proposed Changes To New Producer Public API

2014-02-02 Thread Jay Kreps
Hey Neha, Basically partitionsForTopic will invoke Metadata.fetch(topic).partitionsFor. So it will block on the first request for a given topic while metadata is loaded. Each subsequent request will return whatever metadata is present, the logic for refreshing metadata will remain exactly as it is

Re: Proposed Changes To New Producer Public API

2014-02-02 Thread Jay Kreps
Hey Joel, I actually went with Future instead of Future but yes, as you say that object represents the topic, partition, and offset and would be the place we would put any future info we need to return to the user. The problem with implementing cancel is that the record is already serialized into

Re: Proposed Changes To New Producer Public API

2014-02-01 Thread Neha Narkhede
1. Change send to use java.util.concurrent.Future in send(): Future send(ProducerRecord record, Callback callback) The cancel method will always return false and not do anything. Callback will then be changed to interface Callback { public void onCompletion(RecordPosition position, Exceptio

Re: Proposed Changes To New Producer Public API

2014-02-01 Thread Joel Koshy
. > > > In order to allow correctly choosing a partition the Producer interface > > will include a new method: > > List partitionsForTopic(String topic); > > PartitionInfo will be changed to include the actual Node objects not just > > the Node ids. > > Why are the node id's alone insufficient? Y

Re: Proposed Changes To New Producer Public API

2014-02-01 Thread Joel Koshy
> 1. Change send to use java.util.concurrent.Future in send(): > Future send(ProducerRecord record, Callback callback) +0 on Future vs the custom future: I personally don't find too much of a difference between a custom future and the standard future; (even if it means having to put up with the

Proposed Changes To New Producer Public API

2014-01-31 Thread Jay Kreps
Hey folks, Thanks for all the excellent suggestions on the producer API, I think this really made things better. We'll do a similar thing for the consumer as we get a proposal together. I wanted to summarize everything I heard and the proposed changes I plan to do versus ignore. I'd like to get fe

Re: New Producer Public API

2014-01-31 Thread David Arthur
On 1/30/14 8:18 PM, Joel Koshy wrote: That's a good point about 1A - does seem that we would need to have some kind of TTL for each topic's metadata. Also, WRT ZK dependency I don't think that decision (for the Java client) affects other clients. i.e., other client implementations can use whate

Re: New Producer Public API

2014-01-30 Thread Jay Kreps
I thought a bit about it and I think the getCluster() thing was overly simplistic because we try to only maintain metadata about the current set of topics the producer cares about so the cluster might not have the partitions for the topic the user cares about. I think actually what we need is a new

Re: New Producer Public API

2014-01-30 Thread Joel Koshy
+ dev (this thread has become a bit unwieldy) On Thu, Jan 30, 2014 at 5:15 PM, Joel Koshy wrote: > Does it preclude those various implementations? i.e., it could become > a producer config: > default.partitioner.strategy="minimize-connections"/"roundrobin" - and > so on; and implement those par

Re: New Producer Public API

2014-01-30 Thread Joel Koshy
That's a good point about 1A - does seem that we would need to have some kind of TTL for each topic's metadata. Also, WRT ZK dependency I don't think that decision (for the Java client) affects other clients. i.e., other client implementations can use whatever discovery mechanism it chooses. That

Re: New Producer Public API

2014-01-30 Thread Jun Rao
With option 1A, if we increase # partitions on a topic, how will the producer find out newly created partitions? Do we expect the producer to periodically call getCluster()? As for ZK dependency, one of the goals of client rewrite is to reduce dependencies so that one can implement the client in l

Re: New Producer Public API

2014-01-30 Thread Jay Kreps
One downside to the 1A proposal is that without a Partitioner interface we can't really package up and provide common partitioner implementations. Example of these would be 1. HashPartitioner - The default hash partitioning 2. RoundRobinPartitioner - Just round-robins over partitions 3. ConnectionM

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Clark and all, I thought a little bit about the serialization question. Here are the options I see and the pros and cons I can think of. I'd love to hear people's preferences if you have a strong one. One important consideration is that however the producer works will also need to be how the new

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Yeah I'll fix that name. Hmm, yeah, I agree that often you want to be able delay network connectivity until you have started everything up. But at the same time I kind of loath special init() methods because you always forget to call them and get one round of error every time. I wonder if in those

Re: New Producer Public API

2014-01-24 Thread Roger Hoover
Jay, Thanks for the explanation. I didn't realize that the broker list was for bootstrapping and was not required to be a complete list of all brokers (although I see now that it's clearly stated in the text description of the parameter). Nonetheless, does it still make sense to make the config

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Roger, These are good questions. 1. The producer since 0.8 is actually zookeeper free, so this is not new to this client it is true for the current client as well. Our experience was that direct zookeeper connections from zillions of producers wasn't a good idea for a number of reasons. Our inten

Re: New Producer Public API

2014-01-24 Thread Roger Hoover
A couple comments: 1) Why does the config use a broker list instead of discovering the brokers in ZooKeeper? It doesn't match the HighLevelConsumer API. 2) It looks like broker connections are created on demand. I'm wondering if sometimes you might want to flush out config or network connectivi

New Producer Public API

2014-01-24 Thread Jay Kreps
As mentioned in a previous email we are working on a re-implementation of the producer. I would like to use this email thread to discuss the details of the public API and the configuration. I would love for us to be incredibly picky about this public api now so it is as good as possible and we don'