Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
We've made some progress in our testing. While I do not have a good explanation for all the better behavior today, we have been able to move a substantial number of messages through the system today without any exceptions (> 800K messages). The big things between last night's mess and today was:

Re: Socket timeouts in 0.8

2013-03-22 Thread Neha Narkhede
Bob, We fixed a bunch of bugs in the log layer recently. Are you running the latest version of the code from the 0.8 branch ? Thanks, Neha On Fri, Mar 22, 2013 at 11:27 AM, Bob Jervis wrote: > I'm also seeing in the midst of the chaos (our app is generating 15GB of > logs), the following even

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I'm also seeing in the midst of the chaos (our app is generating 15GB of logs), the following event on one of our borkers: 2013-03-22 17:43:39,257 FATAL kafka.server.KafkaApis: [KafkaApi-1] Halting due to unrecoverable I/O error while handling produce request: kafka.common.KafkaStorageException: I

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I am getting the logs and I am trying to make sense of them. I see a 'Received Request' log entry that appears to be what is coming in from our app. I don't see any 'Completed Request' entries that correspond to those. The only completed entries I see for the logs in question are from the replic

Re: Socket timeouts in 0.8

2013-03-22 Thread Jun Rao
The metadata request is sent to the broker, which will read from ZK. I suggest that you turn on trace level logging for class kafka.network.RequestChannel$ in all brokers. The log will tell you how long each metadata request takes on the broker. You can then set you socket timeout in the producer a

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
What are the number of network threads we should be running with a 2 broker cluster (and replication=2)? We have roughly 150-400 SimpleConsumers running, depending on the application state. We can spend some engineering time consolidating many of the consumers, but the figure I''ve cited is for o

Re: Socket timeouts in 0.8

2013-03-22 Thread Bob Jervis
I've tried this and it appears that we are still seeing the issue. Here is a stack trace of one of the socket timeout exceptions we are seeing (we converted to the SimpleConsumer): 2013-03-22 04:54:51,807 INFO kafka.client.ClientUtils$: Fetching metadata for topic Set(v1-japanese-0, v1-indonesian

Re: Socket timeouts in 0.8

2013-03-21 Thread Jun Rao
Bob, Currently, the metadata request needs to do at least one ZK read per partition. So the more topics/partitions you have, the longer the request takes. So, you need to increase the request timeout. Try something like 60 * 1000 ms. Thanks, Jun On Thu, Mar 21, 2013 at 12:46 PM, Bob Jervis wro

Socket timeouts in 0.8

2013-03-21 Thread Bob Jervis
We are seeing horrible problems. We cannot move data through our 0.8 borker because we are getting socket timeout exceptions and I cannot figure out what settings should be. The fetch metadata stuff is throwing these exceptions and no matter how I tweak the timeouts, I still get horrible timeouts