For (3) we could also do the following:
- On any retryable producer error, force a metadata refresh (in
handleProducerResponse).
- In handleMetadataResponse, the producer can (internally) close out
connections that are no longer valid. (i.e., connections to {old set
of leader brokers} - {new
1. Yes, I think the name could be improved. However that name doesn't
really express what it does. What about RecordSentCallback?
2. As Neha said the nodes are initialized to a random position. Round robin
is preferable to random (lower variance, cheaper to compute, etc). Your
point about skipping
>> (1) if the
replication factor is 1 and there is a broker failure, we can still route
the message, (2) if a bunch of producers are started at the same time, this
prevents them from picking up the same partition in a synchronized way.
You raised a good point, Jun. Regarding #1, shouldn't round ro
Fine with most of the changes. Just a few questions/comments.
1. Would it be better to change Callback to ProducerCallback to distinguish
it from controller callback and potential future other types of callbacks
(e.g. consumer)?
2. If no key is present AND no partition id is present, partitions
w
Also, I thought a bit more about the org.apache thing. This is clearly an
aesthetic thing, but I do feel aesthetics are important.
The arguments for the kafka.* package is basically that putting your domain
name in the namespace is a silly cargo cult thing that Java people do. I
have never seen a
Okay I posted a patch against trunk that carries out the refactoring
described above:
https://issues.apache.org/jira/browse/KAFKA-1227
Updated javadoc is here:
http://empathybox.com/kafka-javadoc
This touches a fair number of files as I also improved documentation and
standardized terminology in
About serialization, I am wondering if most people would try to have a
customized partioning mainly due to logic motivations, such that they want
some messages to go to the same partition; if that is true, they do not
really care about the actually partition id but all they would do is
specify the
It *is* expected that the user call this method on
every single call and because we no longer control the partitioner
interface there will be no way to control this.
Make sense. This will ensure that new partitions are detected as defined by
topic.metadata.refresh.interval.ms.
There are a quite
Hey Neha,
Basically partitionsForTopic will invoke
Metadata.fetch(topic).partitionsFor. So it will block on the first request
for a given topic while metadata is loaded. Each subsequent request will
return whatever metadata is present, the logic for refreshing metadata will
remain exactly as it is
Hey Joel,
I actually went with Future instead of
Future but yes, as you say that object represents the
topic, partition, and offset and would be the place we would put any future
info we need to return to the user.
The problem with implementing cancel is that the record is already
serialized into
1. Change send to use java.util.concurrent.Future in send():
Future send(ProducerRecord record, Callback callback)
The cancel method will always return false and not do anything. Callback
will then be changed to
interface Callback {
public void onCompletion(RecordPosition position, Exceptio
.
>
> > In order to allow correctly choosing a partition the Producer interface
> > will include a new method:
> > List partitionsForTopic(String topic);
> > PartitionInfo will be changed to include the actual Node objects not just
> > the Node ids.
>
> Why are the node id's alone insufficient? Y
> 1. Change send to use java.util.concurrent.Future in send():
> Future send(ProducerRecord record, Callback callback)
+0 on Future vs the custom future: I personally don't find too much of
a difference between a custom future and the standard future; (even if
it means having to put up with the
Hey folks,
Thanks for all the excellent suggestions on the producer API, I think this
really made things better. We'll do a similar thing for the consumer as we
get a proposal together. I wanted to summarize everything I heard and the
proposed changes I plan to do versus ignore. I'd like to get fe
On 1/30/14 8:18 PM, Joel Koshy wrote:
That's a good point about 1A - does seem that we would need to have
some kind of TTL for each topic's metadata.
Also, WRT ZK dependency I don't think that decision (for the Java
client) affects other clients. i.e., other client implementations can
use whate
I thought a bit about it and I think the getCluster() thing was overly
simplistic because we try to only maintain metadata about the current set
of topics the producer cares about so the cluster might not have the
partitions for the topic the user cares about. I think actually what we
need is a new
+ dev
(this thread has become a bit unwieldy)
On Thu, Jan 30, 2014 at 5:15 PM, Joel Koshy wrote:
> Does it preclude those various implementations? i.e., it could become
> a producer config:
> default.partitioner.strategy="minimize-connections"/"roundrobin" - and
> so on; and implement those par
That's a good point about 1A - does seem that we would need to have
some kind of TTL for each topic's metadata.
Also, WRT ZK dependency I don't think that decision (for the Java
client) affects other clients. i.e., other client implementations can
use whatever discovery mechanism it chooses. That
With option 1A, if we increase # partitions on a topic, how will the
producer find out newly created partitions? Do we expect the producer to
periodically call getCluster()?
As for ZK dependency, one of the goals of client rewrite is to reduce
dependencies so that one can implement the client in l
One downside to the 1A proposal is that without a Partitioner interface we
can't really package up and provide common partitioner implementations.
Example of these would be
1. HashPartitioner - The default hash partitioning
2. RoundRobinPartitioner - Just round-robins over partitions
3. ConnectionM
Clark and all,
I thought a little bit about the serialization question. Here are the
options I see and the pros and cons I can think of. I'd love to hear
people's preferences if you have a strong one.
One important consideration is that however the producer works will also
need to be how the new
Yeah I'll fix that name.
Hmm, yeah, I agree that often you want to be able delay network
connectivity until you have started everything up. But at the same time I
kind of loath special init() methods because you always forget to call them
and get one round of error every time. I wonder if in those
Jay,
Thanks for the explanation. I didn't realize that the broker list was for
bootstrapping and was not required to be a complete list of all brokers
(although I see now that it's clearly stated in the text description of the
parameter). Nonetheless, does it still make sense to make the config
Roger,
These are good questions.
1. The producer since 0.8 is actually zookeeper free, so this is not new to
this client it is true for the current client as well. Our experience was
that direct zookeeper connections from zillions of producers wasn't a good
idea for a number of reasons. Our inten
A couple comments:
1) Why does the config use a broker list instead of discovering the brokers
in ZooKeeper? It doesn't match the HighLevelConsumer API.
2) It looks like broker connections are created on demand. I'm wondering
if sometimes you might want to flush out config or network connectivi
As mentioned in a previous email we are working on a re-implementation of
the producer. I would like to use this email thread to discuss the details
of the public API and the configuration. I would love for us to be
incredibly picky about this public api now so it is as good as possible and
we don'
26 matches
Mail list logo