Re: Hi, Can anyone tell me why I cannot produce any message to remote kafka server from my local machine

2013-10-06 Thread Yi Jiang
Hi, Can anyone give me an example? Or the props settings ? I am using the 
latest version of Kafka.

Appreciate!

Sent from my iPhone

On 2013-10-05, at 15:05, Jiang Jacky  wrote:

> Hi, I tried to setup the host.name in servier.properties, it doesn't work.
> I believe it is the network security issue. However, I create a new instance 
> in the same security group without kafka, zookeeper, it does work, it can 
> still produce to kafka server. but when I change to another ec2 account, then 
> create the same instance, and it cannot produce to kafka server.
> I pay attention that there is no outbound port setting in the security group 
> configuration of kafka server ec2, I think if we shall have to set the 
> outbound port for the firewall?
> Do you guys know which outbound port should be opened for kafka server?
> Thanks
> 
> 
> 2013/10/5 Guozhang Wang 
>> Hello Jacky,
>> 
>> Have you read this FAQ:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-OnEC2%2Cwhycan%27tmyhighlevelconsumersconnecttothebrokers%3F
>> 
>> Guozhang
>> 
>> 
>> On Fri, Oct 4, 2013 at 10:41 PM, Jiang Jacky  wrote:
>> 
>> > It is very weird, I have a kafka cluster in EC2, There is no any problem to
>> > produce message from one of node by same producer. But when I move the
>> > producer to my local machine at home, then it gives me the below error:
>> > Failed to send messages after 3 tries.
>> >
>> > Can anyone tell me how do I fix this issue? I have opened all ports of my
>> > machine at my home, and the security group is also opened for kafka server
>> > and zookeeper in EC2. Everything is fine, but I just cannot send any
>> > message to kafka server.
>> > Please help me.
>> > Thanks Everyone!
>> >
>> > Jacky
>> >
>> 
>> 
>> 
>> --
>> -- Guozhang
> 


Re: Managing Millions of Paritions in Kafka

2013-10-06 Thread Neha Narkhede
Kafka is designed to have of the order of few thousands of partitions
roughly less than 10,000. And the main bottleneck is zookeeper. A better
way to design such a system is to have fewer partitions and use keyed
messages to distribute the data over a fixed set of partitions.

Thanks,
Neha
On Oct 5, 2013 8:19 PM, "Ravindranath Akila" 
wrote:

> Initially, I thought dynamic topic creation can be used to maintain per
> user data on Kafka. The I read that partitions can and should be used for
> this instead.
>
> If a partition is to be used to map a user, can there be a million, or even
> billion partitions in a cluster? How does one go about designing such a
> model.
>
> Can the replication tool be used to assign, say partitions 1 - 10,000 on
> replica 1, and 10,001 - 20,000 on replica 2?
>
> If not, since there is a ulimit on the file system, should one model it
> based on a replica/topic/partition approach. Say users 1-10,000 go on topic
> 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> 10k-2, and has 10,000 partitions.
>
> Simply put, how can a million stateful data points be handled? (I deduced
> that a userid-partition number mapping can be done via a partitioner, but
> unless a server can be configured to handle only a given set of partitions,
> with a range based notation, it is almost impossible to handle a large
> dataset. Is it that Kafka can only handle a limited set of stateful data
> right now?)
>
>
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
>
> Btw, why does Kafka have to keep open each partition? Can't a partition be
> opened for read/write when needed only?
>
> Thanks in advance!
>


Re: testing issue with reliable sending

2013-10-06 Thread Neha Narkhede
Does the
leader just wait for the followers in the ISR to consume?

That's right. Until that is done, the producer does not get an ack back. It
has an option of retrying if the previous request times out or fails.

A separate question, can the request.required.acks be set to a higher
positive integer, say "2", to indicate that 2 of say 3 replicas have acked?

Yes that's possible.

Thanks,
Neha
On Oct 5, 2013 10:18 AM, "Jason Rosenberg"  wrote:

> Thanks for the explanation Neha.still holding out hope.
>
> So, if request.required.acks=-1, how does the leader confirm that the other
> brokers have consumed the message, before acking to the producer?  Does the
> leader just wait for the followers in the ISR to consume?  Or does the
> leader have a way to push, or ping the followers to consume?
>
> Couldn't that mechanism be used, during a clean shutdown, even if the
> messages were initially produced with acks=1? That is, when shutting down,
> get acks from all ISR members for each partition, before shutting down.
>
> I'm just a bit leery about using -1 across the board, because of the
> performance hit (but for now it seems the best option to use for reliable
> sending).
>
> A separate question, can the request.required.acks be set to a higher
> positive integer, say "2", to indicate that 2 of say 3 replicas have acked?
>  ("request.required.acks" in the name would seem to indicate this).  I'm
> not saying I'd want to use this (we are hoping to use only a replication
> factor of 2).
>
> Jason
>
>
> On Sat, Oct 5, 2013 at 1:00 PM, Neha Narkhede  >wrote:
>
> > Shouldn't this be part of the contract?  It should be able to make sure
> > this happens before shutting down, no?
> >
> > The leader writes messages to its local log and then the replicas consume
> > messages from the leader and write those to their local logs. If you set
> > request.required.acks=1, the ack is sent to the producer only after the
> > leader has written messages to its local log. What you are asking for, is
> > part of the contract if request.required.acks=-1.
> >
> > In this case, if we need to use
> > required.request.acks=-1, that will pretty much prevent any successful
> > message producing while any of the brokers for a partition is
> unavailable.
> >  So, I don't think that's an option.  (Not to mention the performance
> > degradation).
> >
> > You can implement reliable delivery semantics while allowing rolling
> > restart of brokers by setting request.required.acks=-1. When one of the
> > replicas is shut down, the ISR reduces to remove the replica being shut
> > down and the messages will be committed using the new ISR.
> >
> > Thanks,
> > Neha
> >
> >
> > On Fri, Oct 4, 2013 at 11:51 PM, Jason Rosenberg 
> wrote:
> >
> > > Neha,
> > >
> > > I'm not sure I understand.  I would have thought that if the leader
> > > acknowledges receipt of a message, and is then shut down cleanly (with
> > > controlled shutdown enabled), that it would be able to reliably persist
> > any
> > > in memory buffered messages (and replicate them), before shutting down.
> > >  Shouldn't this be part of the contract?  It should be able to make
> sure
> > > this happens before shutting down, no?
> > >
> > > I would understand a message dropped if it were a hard shutdown.
> > >
> > > I'm not sure then how to implement reliable delivery semantics, while
> > > allowing a rolling restart of the broker cluster (or even to tolerate a
> > > single node failure, where one node might be down for awhile and need
> to
> > be
> > > replaced or have a disk repaired).  In this case, if we need to use
> > > required.request.acks=-1, that will pretty much prevent any successful
> > > message producing while any of the brokers for a partition is
> > unavailable.
> > >  So, I don't think that's an option.  (Not to mention the performance
> > > degradation).
> > >
> > > Is there not a way to make this work more reliably with leader only
> > > acknowledgment, and clean/controlled shutdown?
> > >
> > > My test does succeed, as expected, with acks = -1, at least for the 100
> > or
> > > so iterations I've let it run so far.  It does on occasion send
> > duplicates
> > > (but that's ok with me).
> > >
> > > Jason
> > >
> > >
> > > On Fri, Oct 4, 2013 at 6:38 PM, Neha Narkhede  > > >wrote:
> > >
> > > > The occasional single message loss could happen since
> > > > required.request.acks=1 and the leader is shut down before the
> follower
> > > > gets a chance to copy the message. Can you try your test with num
> acks
> > > set
> > > > to -1 ?
> > > >
> > > > Thanks,
> > > > Neha
> > > > On Oct 4, 2013 1:21 PM, "Jason Rosenberg"  wrote:
> > > >
> > > > > All,
> > > > >
> > > > > I'm having an issue with an integration test I've setup.  This is
> > using
> > > > > 0.8-beta1.
> > > > >
> > > > > The test is to verify that no messages are dropped (or the sender
> > gets
> > > an
> > > > > exception thrown back if failure), while doing a rolling restart
> of a
> > > > > cluster 

Re: testing issue with reliable sending

2013-10-06 Thread Jason Rosenberg
On Sun, Oct 6, 2013 at 4:08 AM, Neha Narkhede wrote:

> Does the
> leader just wait for the followers in the ISR to consume?
>
> That's right. Until that is done, the producer does not get an ack back. It
> has an option of retrying if the previous request times out or fails.
>
>
Ok, so if I initiate a controlled shutdown, in which all partitions that a
shutting down broker is leader of get transferred to another broker, why
can't part of that controlled transfer of leadership include ISR
synchronization, such that no data is lost?  Is there a fundamental reason
why that is not possible?  Is it it worth filing a Jira for a feature
request?  Or is it technically not possible?

I'm ok with losing data in this case during a hard-shutdown, but for a
controlled shutdown, I'm wondering if there's at least a best effort
attempt to be made to sync all writes to the ISR.

Jason


Re: testing issue with reliable sending

2013-10-06 Thread Neha Narkhede
Ok, so if I initiate a controlled shutdown, in which all partitions that a
shutting down broker is leader of get transferred to another broker, why
can't part of that controlled transfer of leadership include ISR
synchronization, such that no data is lost?  Is there a fundamental reason
why that is not possible?  Is it it worth filing a Jira for a feature
request?  Or is it technically not possible?

It is not as straightforward as it seems and it will slow down the shut
down operation furthermore (currently several zookeeper writes already slow
it down) and also increase the leader unavailability window. But keeping
the performance degradation aside, it is tricky since in order to stop
"new" data from coming in, we need to move the leaders off of the current
leader (broker being shutdown) onto some other follower. Now moving the
leader means some other follower will become the leader and as part of that
will stop copying existing data from the old leader and will start
receiving new data. What you are asking for is to insert some kind of
"wait" before the new follower becomes the leader so that the consumption
of messages is "done". What is the definition of "done" ? This is only
dictated by the log end offset of the old leader and will have to be
included in the new leader transition state change request by the
controller. So this means an extra RPC between the controller and the other
brokers as part of the leader transition. Also there is no guarantee that
the other followers are alive and consuming, so how long does the broker
being shutdown wait ? Since it is no longer the leader, it technically
cannot kick followers out of the ISR, so ISR handling is another thing that
becomes tricky here.

Also, this sort of catching up on the last few messages is not limited to
just controlled shutdown, but you can extend it to any other leader
transition. So special casing it does not make much sense. This "wait"
combined with the extra RPC will mean extending the leader unavailability
window even further than what we have today.

So this is fair amount of work to make sure last few messages are
replicated to all followers. Instead what we do is to simplify the leader
transition and let the clients handle the retries for requests that have
not made it to the desired number of replicas, which is configurable.

We can discuss that in a JIRA if that helps. May be other committers have
more ideas.

Thanks,
Neha


On Sun, Oct 6, 2013 at 10:08 AM, Jason Rosenberg  wrote:

> On Sun, Oct 6, 2013 at 4:08 AM, Neha Narkhede  >wrote:
>
> > Does the
> > leader just wait for the followers in the ISR to consume?
> >
> > That's right. Until that is done, the producer does not get an ack back.
> It
> > has an option of retrying if the previous request times out or fails.
> >
> >
> Ok, so if I initiate a controlled shutdown, in which all partitions that a
> shutting down broker is leader of get transferred to another broker, why
> can't part of that controlled transfer of leadership include ISR
> synchronization, such that no data is lost?  Is there a fundamental reason
> why that is not possible?  Is it it worth filing a Jira for a feature
> request?  Or is it technically not possible?
>
> I'm ok with losing data in this case during a hard-shutdown, but for a
> controlled shutdown, I'm wondering if there's at least a best effort
> attempt to be made to sync all writes to the ISR.
>
> Jason
>


Re: Managing Millions of Paritions in Kafka

2013-10-06 Thread Ravindranath Akila
Thanks a lot Neha!

Actually, using keyed messages(with Simple Consumers) was the approach we
took. But it seems we can't map each user to a new partition due to
Zookeeper limitations. Rather, we will have to map a "group" of users on
one partition. Then we can't fetch the messages for only one user.

It seems our data is best put on HBase with a TTL and versioning.

Thanks!

R. A.
On 6 Oct 2013 16:00, "Neha Narkhede"  wrote:

> Kafka is designed to have of the order of few thousands of partitions
> roughly less than 10,000. And the main bottleneck is zookeeper. A better
> way to design such a system is to have fewer partitions and use keyed
> messages to distribute the data over a fixed set of partitions.
>
> Thanks,
> Neha
> On Oct 5, 2013 8:19 PM, "Ravindranath Akila" 
> wrote:
>
> > Initially, I thought dynamic topic creation can be used to maintain per
> > user data on Kafka. The I read that partitions can and should be used for
> > this instead.
> >
> > If a partition is to be used to map a user, can there be a million, or
> even
> > billion partitions in a cluster? How does one go about designing such a
> > model.
> >
> > Can the replication tool be used to assign, say partitions 1 - 10,000 on
> > replica 1, and 10,001 - 20,000 on replica 2?
> >
> > If not, since there is a ulimit on the file system, should one model it
> > based on a replica/topic/partition approach. Say users 1-10,000 go on
> topic
> > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> > 10k-2, and has 10,000 partitions.
> >
> > Simply put, how can a million stateful data points be handled? (I deduced
> > that a userid-partition number mapping can be done via a partitioner, but
> > unless a server can be configured to handle only a given set of
> partitions,
> > with a range based notation, it is almost impossible to handle a large
> > dataset. Is it that Kafka can only handle a limited set of stateful data
> > right now?)
> >
> >
> >
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> >
> > Btw, why does Kafka have to keep open each partition? Can't a partition
> be
> > opened for read/write when needed only?
> >
> > Thanks in advance!
> >
>


Re: Managing Millions of Paritions in Kafka

2013-10-06 Thread Benjamin Black
What you are discovering is that Kafka is a message broker, not a database.


On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila <
ravindranathak...@gmail.com> wrote:

> Thanks a lot Neha!
>
> Actually, using keyed messages(with Simple Consumers) was the approach we
> took. But it seems we can't map each user to a new partition due to
> Zookeeper limitations. Rather, we will have to map a "group" of users on
> one partition. Then we can't fetch the messages for only one user.
>
> It seems our data is best put on HBase with a TTL and versioning.
>
> Thanks!
>
> R. A.
> On 6 Oct 2013 16:00, "Neha Narkhede"  wrote:
>
> > Kafka is designed to have of the order of few thousands of partitions
> > roughly less than 10,000. And the main bottleneck is zookeeper. A better
> > way to design such a system is to have fewer partitions and use keyed
> > messages to distribute the data over a fixed set of partitions.
> >
> > Thanks,
> > Neha
> > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <
> ravindranathak...@gmail.com>
> > wrote:
> >
> > > Initially, I thought dynamic topic creation can be used to maintain per
> > > user data on Kafka. The I read that partitions can and should be used
> for
> > > this instead.
> > >
> > > If a partition is to be used to map a user, can there be a million, or
> > even
> > > billion partitions in a cluster? How does one go about designing such a
> > > model.
> > >
> > > Can the replication tool be used to assign, say partitions 1 - 10,000
> on
> > > replica 1, and 10,001 - 20,000 on replica 2?
> > >
> > > If not, since there is a ulimit on the file system, should one model it
> > > based on a replica/topic/partition approach. Say users 1-10,000 go on
> > topic
> > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> > > 10k-2, and has 10,000 partitions.
> > >
> > > Simply put, how can a million stateful data points be handled? (I
> deduced
> > > that a userid-partition number mapping can be done via a partitioner,
> but
> > > unless a server can be configured to handle only a given set of
> > partitions,
> > > with a range based notation, it is almost impossible to handle a large
> > > dataset. Is it that Kafka can only handle a limited set of stateful
> data
> > > right now?)
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> > >
> > > Btw, why does Kafka have to keep open each partition? Can't a partition
> > be
> > > opened for read/write when needed only?
> > >
> > > Thanks in advance!
> > >
> >
>


Re: Managing Millions of Paritions in Kafka

2013-10-06 Thread Ravindranath Akila
Actually, we need a broker. But a more stateful one. Hence the decision to
use TTL on hbase.
On 7 Oct 2013 08:38, "Benjamin Black"  wrote:

> What you are discovering is that Kafka is a message broker, not a database.
>
>
> On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila <
> ravindranathak...@gmail.com> wrote:
>
> > Thanks a lot Neha!
> >
> > Actually, using keyed messages(with Simple Consumers) was the approach we
> > took. But it seems we can't map each user to a new partition due to
> > Zookeeper limitations. Rather, we will have to map a "group" of users on
> > one partition. Then we can't fetch the messages for only one user.
> >
> > It seems our data is best put on HBase with a TTL and versioning.
> >
> > Thanks!
> >
> > R. A.
> > On 6 Oct 2013 16:00, "Neha Narkhede"  wrote:
> >
> > > Kafka is designed to have of the order of few thousands of partitions
> > > roughly less than 10,000. And the main bottleneck is zookeeper. A
> better
> > > way to design such a system is to have fewer partitions and use keyed
> > > messages to distribute the data over a fixed set of partitions.
> > >
> > > Thanks,
> > > Neha
> > > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <
> > ravindranathak...@gmail.com>
> > > wrote:
> > >
> > > > Initially, I thought dynamic topic creation can be used to maintain
> per
> > > > user data on Kafka. The I read that partitions can and should be used
> > for
> > > > this instead.
> > > >
> > > > If a partition is to be used to map a user, can there be a million,
> or
> > > even
> > > > billion partitions in a cluster? How does one go about designing
> such a
> > > > model.
> > > >
> > > > Can the replication tool be used to assign, say partitions 1 - 10,000
> > on
> > > > replica 1, and 10,001 - 20,000 on replica 2?
> > > >
> > > > If not, since there is a ulimit on the file system, should one model
> it
> > > > based on a replica/topic/partition approach. Say users 1-10,000 go on
> > > topic
> > > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on
> topic
> > > > 10k-2, and has 10,000 partitions.
> > > >
> > > > Simply put, how can a million stateful data points be handled? (I
> > deduced
> > > > that a userid-partition number mapping can be done via a partitioner,
> > but
> > > > unless a server can be configured to handle only a given set of
> > > partitions,
> > > > with a range based notation, it is almost impossible to handle a
> large
> > > > dataset. Is it that Kafka can only handle a limited set of stateful
> > data
> > > > right now?)
> > > >
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> > > >
> > > > Btw, why does Kafka have to keep open each partition? Can't a
> partition
> > > be
> > > > opened for read/write when needed only?
> > > >
> > > > Thanks in advance!
> > > >
> > >
> >
>


Re: Managing Millions of Paritions in Kafka

2013-10-06 Thread Benjamin Black
Ha ha, yes, exactly, you need a database. Kafka is a wonderful tool, but
not the right one for a job like that.


On Sun, Oct 6, 2013 at 7:03 PM, Ravindranath Akila <
ravindranathak...@gmail.com> wrote:

> Actually, we need a broker. But a more stateful one. Hence the decision to
> use TTL on hbase.
> On 7 Oct 2013 08:38, "Benjamin Black"  wrote:
>
> > What you are discovering is that Kafka is a message broker, not a
> database.
> >
> >
> > On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila <
> > ravindranathak...@gmail.com> wrote:
> >
> > > Thanks a lot Neha!
> > >
> > > Actually, using keyed messages(with Simple Consumers) was the approach
> we
> > > took. But it seems we can't map each user to a new partition due to
> > > Zookeeper limitations. Rather, we will have to map a "group" of users
> on
> > > one partition. Then we can't fetch the messages for only one user.
> > >
> > > It seems our data is best put on HBase with a TTL and versioning.
> > >
> > > Thanks!
> > >
> > > R. A.
> > > On 6 Oct 2013 16:00, "Neha Narkhede"  wrote:
> > >
> > > > Kafka is designed to have of the order of few thousands of partitions
> > > > roughly less than 10,000. And the main bottleneck is zookeeper. A
> > better
> > > > way to design such a system is to have fewer partitions and use keyed
> > > > messages to distribute the data over a fixed set of partitions.
> > > >
> > > > Thanks,
> > > > Neha
> > > > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <
> > > ravindranathak...@gmail.com>
> > > > wrote:
> > > >
> > > > > Initially, I thought dynamic topic creation can be used to maintain
> > per
> > > > > user data on Kafka. The I read that partitions can and should be
> used
> > > for
> > > > > this instead.
> > > > >
> > > > > If a partition is to be used to map a user, can there be a million,
> > or
> > > > even
> > > > > billion partitions in a cluster? How does one go about designing
> > such a
> > > > > model.
> > > > >
> > > > > Can the replication tool be used to assign, say partitions 1 -
> 10,000
> > > on
> > > > > replica 1, and 10,001 - 20,000 on replica 2?
> > > > >
> > > > > If not, since there is a ulimit on the file system, should one
> model
> > it
> > > > > based on a replica/topic/partition approach. Say users 1-10,000 go
> on
> > > > topic
> > > > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on
> > topic
> > > > > 10k-2, and has 10,000 partitions.
> > > > >
> > > > > Simply put, how can a million stateful data points be handled? (I
> > > deduced
> > > > > that a userid-partition number mapping can be done via a
> partitioner,
> > > but
> > > > > unless a server can be configured to handle only a given set of
> > > > partitions,
> > > > > with a range based notation, it is almost impossible to handle a
> > large
> > > > > dataset. Is it that Kafka can only handle a limited set of stateful
> > > data
> > > > > right now?)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> > > > >
> > > > > Btw, why does Kafka have to keep open each partition? Can't a
> > partition
> > > > be
> > > > > opened for read/write when needed only?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > >
> > >
> >
>


Re: testing issue with reliable sending

2013-10-06 Thread Jason Rosenberg
Thanks Neha for continued insight

What you describe as a possible solution is what I was thinking (although I
wasn't as concerned as maybe I should be with the added delay of the new
leader delaying processing new requests while it finishes consuming from
the old leader, and communicates back and forth to complete the leader
hand-off).

E.g., isn't the definition of a broker being in the ISR that it is keeping
itself up to date with the leader of the ISR (within an allowed replication
lag)?  So, it should be possible to elect a new leader, have it buffer
incoming requests while it finishes replicating everything from the old
leader (which it should complete within an allowed replication lag
timeout), and then start acking any buffered requests.

I guess this buffering period would be akin to the leader 'unavailability'
window, but in reality, it is just a delay (and shouldn't be much more than
the replication lag timeout).  The producing client can decide to timeout
the request if it's taking too long, and retry it (that's normal anyway if
a producer fails to get an ack, etc.).

So, as long as the old leader atomically starts rejecting incoming requests
at the time it relinquishes leadership, then producer requests will fail
fast, initiate a new meta data request to find the new leader, and continue
on with the new leader (possibly after a bit of a replication catch up
delay).

The old leader can then proceed with shutdown after the new leader has
caught up (which it will signal with an RPC).

I realize there are all sorts of edge cases here, but it seems there should
be a way to make it work.

I guess I'm willing to allow a bit of an 'unavailability' delay, rather
than have messages silently acked then lost during a controlled
shutdown/new leader election.

My hunch is, in the steady state, when leadership is stable and brokers
aren't being shutdown, the performance benefit of being able to use
request.required.acks=1 (instead of -1), far outweighs any momentary
performance blip during a leader availability delay during a leadership
change (which should be fully recoverable and retryable by a concerned
producer client).

Now, of course, if I want to guard against a hard-shutdown, then that's a
whole different ball of wax!

Jason


On Sun, Oct 6, 2013 at 4:30 PM, Neha Narkhede wrote:

> Ok, so if I initiate a controlled shutdown, in which all partitions that a
> shutting down broker is leader of get transferred to another broker, why
> can't part of that controlled transfer of leadership include ISR
> synchronization, such that no data is lost?  Is there a fundamental reason
> why that is not possible?  Is it it worth filing a Jira for a feature
> request?  Or is it technically not possible?
>
> It is not as straightforward as it seems and it will slow down the shut
> down operation furthermore (currently several zookeeper writes already slow
> it down) and also increase the leader unavailability window. But keeping
> the performance degradation aside, it is tricky since in order to stop
> "new" data from coming in, we need to move the leaders off of the current
> leader (broker being shutdown) onto some other follower. Now moving the
> leader means some other follower will become the leader and as part of that
> will stop copying existing data from the old leader and will start
> receiving new data. What you are asking for is to insert some kind of
> "wait" before the new follower becomes the leader so that the consumption
> of messages is "done". What is the definition of "done" ? This is only
> dictated by the log end offset of the old leader and will have to be
> included in the new leader transition state change request by the
> controller. So this means an extra RPC between the controller and the other
> brokers as part of the leader transition. Also there is no guarantee that
> the other followers are alive and consuming, so how long does the broker
> being shutdown wait ? Since it is no longer the leader, it technically
> cannot kick followers out of the ISR, so ISR handling is another thing that
> becomes tricky here.
>
> Also, this sort of catching up on the last few messages is not limited to
> just controlled shutdown, but you can extend it to any other leader
> transition. So special casing it does not make much sense. This "wait"
> combined with the extra RPC will mean extending the leader unavailability
> window even further than what we have today.
>
> So this is fair amount of work to make sure last few messages are
> replicated to all followers. Instead what we do is to simplify the leader
> transition and let the clients handle the retries for requests that have
> not made it to the desired number of replicas, which is configurable.
>
> We can discuss that in a JIRA if that helps. May be other committers have
> more ideas.
>
> Thanks,
> Neha
>
>
> On Sun, Oct 6, 2013 at 10:08 AM, Jason Rosenberg  wrote:
>
> > On Sun, Oct 6, 2013 at 4:08 AM, Neha Narkhede  > >wrote:
> >
> > > Does th

producer request.timeout.ms needs to be updated in docs

2013-10-06 Thread Jason Rosenberg
Seems the default for this is no 1
Online doc shows 1500

Just curious, why was this value updated?

Jason