We've made some progress in our testing. While I do not have a good
explanation for all the better behavior today, we have been able to move a
substantial number of messages through the system today without any
exceptions (> 800K messages).
The big things between last night's mess and today was:
Bob,
We fixed a bunch of bugs in the log layer recently. Are you running the
latest version of the code from the 0.8 branch ?
Thanks,
Neha
On Fri, Mar 22, 2013 at 11:27 AM, Bob Jervis wrote:
> I'm also seeing in the midst of the chaos (our app is generating 15GB of
> logs), the following even
I'm also seeing in the midst of the chaos (our app is generating 15GB of
logs), the following event on one of our borkers:
2013-03-22 17:43:39,257 FATAL kafka.server.KafkaApis: [KafkaApi-1] Halting
due to unrecoverable I/O error while handling produce request:
kafka.common.KafkaStorageException: I
I am getting the logs and I am trying to make sense of them. I see a
'Received Request' log entry that appears to be what is coming in from our
app. I don't see any 'Completed Request' entries that correspond to those.
The only completed entries I see for the logs in question are from the
replic
The metadata request is sent to the broker, which will read from ZK. I
suggest that you turn on trace level logging for class
kafka.network.RequestChannel$ in all brokers. The log will tell you how
long each metadata request takes on the broker. You can then set you socket
timeout in the producer a
What are the number of network threads we should be running with a 2 broker
cluster (and replication=2)? We have roughly 150-400 SimpleConsumers
running, depending on the application state. We can spend some engineering
time consolidating many of the consumers, but the figure I''ve cited is for
o
I've tried this and it appears that we are still seeing the issue. Here is
a stack trace of one of the socket timeout exceptions we are seeing (we
converted to the SimpleConsumer):
2013-03-22 04:54:51,807 INFO kafka.client.ClientUtils$: Fetching metadata
for topic Set(v1-japanese-0, v1-indonesian
Bob,
Currently, the metadata request needs to do at least one ZK read per
partition. So the more topics/partitions you have, the longer the request
takes. So, you need to increase the request timeout. Try something like 60
* 1000 ms.
Thanks,
Jun
On Thu, Mar 21, 2013 at 12:46 PM, Bob Jervis wro
We are seeing horrible problems. We cannot move data through our 0.8
borker because we are getting socket timeout exceptions and I cannot figure
out what settings should be. The fetch metadata stuff is throwing these
exceptions and no matter how I tweak the timeouts, I still get horrible
timeouts