Re: 0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Scott Clasen
Thanks! Would there be any difference if I instead deleted all the Kafka data from zookeeper and booted 3 instances with different broker id? clients with cached broker id lists or any other issue? Sent from my iPhone On Mar 22, 2013, at 9:15 PM, Jun Rao wrote: > In scenario 2, you can b

Re: 0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Jun Rao
In scenario 2, you can bring up 3 new brokers with the same broker id. You won't get the data back. However, new data can be published to and consumed from the new brokers. Thanks, Jun On Fri, Mar 22, 2013 at 2:17 PM, Scott Clasen wrote: > Thanks Neha- > > To Clarify... > > *In scenario => 1 w

Re: Kafka mirroring fault tolerance

2013-03-22 Thread Jun Rao
Yes, this is true if you have only 1 broker in the target cluster. If you set up multiple brokers in the target cluster, mirror maker will send messages to available brokers. Thanks, Jun On Fri, Mar 22, 2013 at 12:02 PM, Riju Kallivalappil < riju.kallivalap...@corp.247customer.com> wrote: > Hi,

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
We've made some progress in our testing. While I do not have a good explanation for all the better behavior today, we have been able to move a substantial number of messages through the system today without any exceptions (> 800K messages). The big things between last night's mess and today was:

Re: 0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Neha Narkhede
> *In scenario => 1 will the new broker get all messages on the other brokers > replicated to it? Yes, unless it gets all the messages, it does not reflect the new replicas state in zookeeper. > *In Scenario 2 => it is clear that the data is gone, but I still need > producers to be able to send a

Re: 0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Scott Clasen
Thanks Neha- To Clarify... *In scenario => 1 will the new broker get all messages on the other brokers replicated to it? *In Scenario 2 => it is clear that the data is gone, but I still need producers to be able to send and consumers to receive on the same topic. In my testing today I was unable

Re: 0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Neha Narkhede
* Scenario 1: BrokerID 1,2,3 Broker 2 dies. Here, you can use reassign partitions tool and for all partitions that had a replica on broker 2, move it to broker 4 * Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK still there. There is no way to recover any data here since ther

Re: Socket timeouts in 0.8

2013-03-22 Thread Neha Narkhede
Bob, We fixed a bunch of bugs in the log layer recently. Are you running the latest version of the code from the 0.8 branch ? Thanks, Neha On Fri, Mar 22, 2013 at 11:27 AM, Bob Jervis wrote: > I'm also seeing in the midst of the chaos (our app is generating 15GB of > logs), the following even

Kafka mirroring fault tolerance

2013-03-22 Thread Riju Kallivalappil
Hi, I've a question about fault tolerance of Kafka mirror maker (0.7.1) Let's say that I've a mirroring setup with topics in broker B1 mirrored to broker B2. On B2, I've Kafka mirror maker and Kafka broker process running. Now, following is what I noticed when the Kafka broker process on B2 is r

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I'm also seeing in the midst of the chaos (our app is generating 15GB of logs), the following event on one of our borkers: 2013-03-22 17:43:39,257 FATAL kafka.server.KafkaApis: [KafkaApi-1] Halting due to unrecoverable I/O error while handling produce request: kafka.common.KafkaStorageException: I

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I am getting the logs and I am trying to make sense of them. I see a 'Received Request' log entry that appears to be what is coming in from our app. I don't see any 'Completed Request' entries that correspond to those. The only completed entries I see for the logs in question are from the replic

0.8 best practices for migrating / electing leaders in failure situations?

2013-03-22 Thread Scott Clasen
What would the recommended practice be for the following scenarios? Running on EC2, ephemperal disks only for kafka. There are 3 kafka servers. The broker ids are always increasing. If a broker dies its never coming back. All topics have a replication factor of 3. * Scenario 1: BrokerID 1,2,3

Re: Socket timeouts in 0.8

2013-03-22 Thread Jun Rao
The metadata request is sent to the broker, which will read from ZK. I suggest that you turn on trace level logging for class kafka.network.RequestChannel$ in all brokers. The log will tell you how long each metadata request takes on the broker. You can then set you socket timeout in the producer a

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
What are the number of network threads we should be running with a 2 broker cluster (and replication=2)? We have roughly 150-400 SimpleConsumers running, depending on the application state. We can spend some engineering time consolidating many of the consumers, but the figure I''ve cited is for o

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I've tried this and it appears that we are still seeing the issue. Here is a stack trace of one of the socket timeout exceptions we are seeing (we converted to the SimpleConsumer): 2013-03-22 04:54:51,807 INFO kafka.client.ClientUtils$: Fetching metadata for topic Set(v1-japanese-0, v1-indonesian

Re: Replicas for partition are dead

2013-03-22 Thread Jason Huang
Makes sense. thanks! Jason On Fri, Mar 22, 2013 at 11:17 AM, Neha Narkhede wrote: > Jason, > > We will document the migration steps. The only reason you have to wipe out > data this time is that you were running an older version and we made some > zookeeper format changes. Such changes are expe

Re: Connection reset by peer

2013-03-22 Thread Yonghui Zhao
thanks Jun!Will tune our GC setting. Sent from my iPad 在 2013-3-22,23:05,Jun Rao 写道: > A typical reason for many rebalancing is the consumer side GC. If so, you > will see logs in the consume saying sth like "expired session" for ZK. > Occasional rebalances are fine. Too many rebalances can slo

Re: Replicas for partition are dead

2013-03-22 Thread Neha Narkhede
Jason, We will document the migration steps. The only reason you have to wipe out data this time is that you were running an older version and we made some zookeeper format changes. Such changes are expected until the final release. Once it is released, we don't expect to make such big changes. T

Re: Consume from X messages ago

2013-03-22 Thread James Englert
Thanks for the help. FWIW, I ended up writing a simple Util that I can use as my consumer is starting up to move the offset back. It *seems* to work decently. Thoughts? Would this be something that would be helpful for contribution back to Kafka, or is the idea just poor? /** * Attempts to

Re: Replicas for partition are dead

2013-03-22 Thread Jason Huang
I see. Since I am not running anything for production at this point - I will probably just do this. However, if someone runs 0.8.0 at production and they want to upgrade to the latest version, how should they migrate their message? Maybe there should be something documented in the wiki for this?

Re: Replicas for partition are dead

2013-03-22 Thread Jun Rao
The easiest way is to wipe out both ZK and kafka data and start from scratch. Thanks, Jun On Fri, Mar 22, 2013 at 6:51 AM, Jason Huang wrote: > Thanks Jun. > > I have built the new kafka version and start the services. You > mentioned that ZK data structure has been changed - does that mean we

Re: Connection reset by peer

2013-03-22 Thread Jun Rao
A typical reason for many rebalancing is the consumer side GC. If so, you will see logs in the consume saying sth like "expired session" for ZK. Occasional rebalances are fine. Too many rebalances can slow down the consumption and you will need to tune your GC setting. Thanks, Jun On Thu, Mar 21

Re: Replicas for partition are dead

2013-03-22 Thread Jason Huang
Thanks Jun. I have built the new kafka version and start the services. You mentioned that ZK data structure has been changed - does that mean we can't reload the previous messages from current log files? I actually tried to copy the log files (.logs and .index) to the new kafka instance but get th