Re: upgrade from beta1 to 0.81

2013-12-23 Thread Jun Rao
Did you hard kill the broker? If so, do you have the fix for KAFKA-1112? Thanks, Jun On Fri, Dec 20, 2013 at 4:05 PM, Drew Goya wrote: > This is the exception I ran into, I was able to fix it by deleting the > /data/kafka/logs/Events2-124/ directory. That directory contained a non > zero si

Re: Connection Timeouts

2013-12-23 Thread Jun Rao
These configs are reasonable and shouldn't affect consumer timeouts. Did you get the time breakdown from the request log? Thanks, Jun On Fri, Dec 20, 2013 at 6:14 PM, Tom Amon wrote: > I figured out the cause but I'm stumped as to the reason. If the brokers do > _not_ have the replica setting

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Neha Narkhede
Are you hard killing the brokers? And is this issue reproducible? On Sat, Dec 21, 2013 at 11:39 AM, Drew Goya wrote: > Hey guys, another small issue to report for 0.8.1. After a couple days 3 > of my brokers had fallen off the ISR list for a 2-3 of their partitions. > > I didn't see anything u

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Drew Goya
Occasionally I do have to hard kill brokers, the kafka-server-stop.sh script stopped working for me a few months ago. I saw another thread in the mailing list mentioning the issue too. I'll change the signal back to SIGTERM and run that way for a while, hopefully the problem goes away. This is t

Re: upgrade from beta1 to 0.81

2013-12-23 Thread Drew Goya
I'm not sure if I had hard killed the broker but I do have the fix for that case. I currently have this commit deployed: commit 87efda7f818218e0868be7032c73c994d75931fd Author: Guozhang Wang Date: Fri Nov 22 09:16:39 2013 -0800 kafka-1103; Consumer uses two zkclients; patched by Guozhang

Re: Consumer Group Rebalance Issues

2013-12-23 Thread Drew Goya
Thanks, I migrated our ZK cluster over to 3.3 this weekend. Hopefully that does it! On Fri, Dec 20, 2013 at 9:09 AM, Jun Rao wrote: > Hmm, not sure how stable 3.4.4. We have been using 3.3.4 and haven't seen > issues with ZK as long as there aren't many ZK session expirations. > > Thanks, > >

Re: Data loss in case of request.required.acks set to -1

2013-12-23 Thread Guozhang Wang
Hanish, Originally when you create the two partitions their leadership should be evenly distributed to two brokers, i.e. one broker get one partition. But from your case broker 1 is the leader for both partition 1 and 0, and from the replica list broker 0 should be originally the leader for partit

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Guozhang Wang
Hi Drew, I tried the kafka-server-stop script and it worked for me. Wondering which OS are you using? Guozhang On Mon, Dec 23, 2013 at 10:57 AM, Drew Goya wrote: > Occasionally I do have to hard kill brokers, the kafka-server-stop.sh > script stopped working for me a few months ago. I saw an

Re: Data loss in case of request.required.acks set to -1

2013-12-23 Thread Jason Rosenberg
Is it possible to expose programmatically, the number of brokers in ISR for each partition? We could make this a gating thing before shutting down a broker gracefully, to make sure things are in good shape.I guess controlled shutdown assures this anyway, in a sense. Jason On Mon, Dec 23

Re: Consumer Group Rebalance Issues

2013-12-23 Thread Jason Rosenberg
We recently upgraded to 3.4.5, so far without incident. But I'd be interested to know whether we confirm that there are known problems with this! Jason On Mon, Dec 23, 2013 at 2:04 PM, Drew Goya wrote: > Thanks, I migrated our ZK cluster over to 3.3 this weekend. Hopefully that > does it! >

Re: Data loss in case of request.required.acks set to -1

2013-12-23 Thread Guozhang Wang
Controlled shutdown will wait for the brokers which are going to take over the partitions the shutting down broker is handling to be in sync before doing the migration, so yes it will assure this. To check the number of brokers in ISR you can use kafka-topics tool. Guozhang On Mon, Dec 23, 2013

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Drew Goya
We are running on an Amazon Linux AMI, this is our specific version: Linux version 2.6.32-220.23.1.el6.centos.plus.x86_64 ( mockbu...@c6b5.bsys.dev.centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Tue Jun 19 04:14:37 BST 2012 On Mon, Dec 23, 2013 at 11:24 AM, Guozhang Wan

understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
In our broker logs, we occasionally see errors like this: 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] server.KafkaApis - [KafkaApi-45] Error when processing fetch request for partition [mytopic,0] offset 204243601 from consumer with correlation id 130341 kafka.common.OffsetOutOfRangeEx

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Drew Goya
Hey All, another thing to report for my 0.8.1 migration. I am seeing these errors occasionally right after a I run a leader election. This looks to be related to KAFKA-860 as it is the same exception. I see this issue was closed a while go though and I should be running a commit with the fix in.

Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Guozhang Wang
Jason, You can search the correlation id in the public access log on the servers to get the consumer information. As for logging, I agree that we should use the same level on both sides. Could you file a jira for this? Guozhang On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg wrote: > In our

Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
Hi Guozhang, I'm not sure I understand your first answer. I don't see anything regarding the correlation id, elsewhere in the broker logs.They only show up in those ERROR messages I do see correlation id's in clients, but not on the broker. Jason On Mon, Dec 23, 2013 at 6:46 PM, G

Re: Data loss in case of request.required.acks set to -1

2013-12-23 Thread Hanish Bansal
On Tue, Dec 24, 2013 at 12:52 AM, Guozhang Wang wrote: > Hanish, > > Originally when you create the two partitions their leadership should be > evenly distributed to two brokers, i.e. one broker get one partition. > But from your case broker 1 is the leader for both partition 1 and 0, and > from

Re: Data loss in case of request.required.acks set to -1

2013-12-23 Thread Hanish Bansal
Sorry last message was sent by mistake. Hi Guazhang, Please find my comments below : On Tue, Dec 24, 2013 at 12:52 AM, Guozhang Wang wrote: > Hanish, > > Originally when you create the two partitions their leadership should be > evenly distributed to two brokers, i.e. one broker get one parti

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Jun Rao
This is probably related to KAFKA-1154. Could you upgrade to the latest trunk? Thanks, Jun On Mon, Dec 23, 2013 at 3:21 PM, Drew Goya wrote: > Hey All, another thing to report for my 0.8.1 migration. I am seeing these > errors occasionally right after a I run a leader election. This looks t

Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jun Rao
Did you enable request log? It logs the ip of every request. Thanks, Jun On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg wrote: > Hi Guozhang, > > I'm not sure I understand your first answer. I don't see anything > regarding the correlation id, elsewhere in the broker logs.They only > s

Re: Migrating a cluster from 0.8.0 to 0.8.1

2013-12-23 Thread Drew Goya
Thanks for the help with all this stuff guys! I completed a rolling upgrade to trunk/b23cf19 and was able to issue a re-election without any brokers dropping out of the ISR list. On Mon, Dec 23, 2013 at 8:43 PM, Jun Rao wrote: > This is probably related to KAFKA-1154. Could you upgrade to the

Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
Hmmm, it looks like I'm enabling all logging at INFO, and the request logging is only done at TRACE (why is that?). I suppose one wouldn't normally want to see request logs, so by default, they aren't enabled? On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao wrote: > Did you enable request log? It lo

Understanding the min fetch rate metric

2013-12-23 Thread Jason Rosenberg
I'm realizing I'm not quite sure what the 'min fetch rate' metrics is indicating, for consumers. Can someone offer an explanation? Is it related to the 'max lag' metric? Jason