NotLeaderForPartitionException after creating a topic

2013-11-01 Thread Henry Ma
Hello, I am using 0.8.0-beta1, with two brokers. After using bin/kafka-create-topic.sh created a topic named "ead_click", under bin/kafka-list-topic.sh I can see the topic seems to be created successfully: [2013-11-01 15:49:35,388] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient

consumer null pointer exception

2013-11-01 Thread Kane Kane
I've got the following exception running produce/consume loop for several hours, that was just single exception, but during that time both producers and consumers slowed down a lot. After that looks like everything works fine, though i have suspicion some messages were lost. Can anyone explain what

uncontrolled shutdown

2013-11-01 Thread Kane Kane
When machine with kafka dies, most often broker cannot start itself with errors about index files being corrupt. After i delete them manually broker usually can boot up successfully. Shouldn't kafka try to delete/rebuild broken index files itself? Also this exception looks a bit weird: java.lang.I

Re: Topic creation on restart

2013-11-01 Thread Neha Narkhede
It is proportional to the number of topics but still seems too long. Did the broker have high queue time on all requests? Also how many topics existed on this cluster? Thanks, Neha On Oct 31, 2013 10:56 PM, "Jason Rosenberg" wrote: > In this case, it appears to have gone on for 104 seconds. Sho

Has there been a release 0.8.0?

2013-11-01 Thread Maier, Dr. Andreas
Hello, This might be a stupid question, but I'm seeing a new tag 0.8.0 in the kafka git repository. Does this mean, a final 0.8.0 release has been created and can be fetched using this tag? I couldn't find an official announcement though. Should one better wait for it, or did I overlook it? Since

Re: Has there been a release 0.8.0?

2013-11-01 Thread Neha Narkhede
It is undergoing voting on release candidate 1 until November 4th. So yes, very close to being released. Thanks, Neha On Nov 1, 2013 3:10 AM, "Maier, Dr. Andreas" wrote: > Hello, > > This might be a stupid question, but > I'm seeing a new tag 0.8.0 in the kafka git repository. > Does this mean,

Re: uncontrolled shutdown

2013-11-01 Thread Neha Narkhede
When you say the Kafka dies, did you mean a kill -9 or kill -15 ? I agree that Kafka should try and delete/rebuild index files, but in this case, it sounds like a bug. Can you file a JIRA? Thanks, Neha On Fri, Nov 1, 2013 at 1:51 AM, Kane Kane wrote: > When machine with kafka dies, most often

Re: consumer null pointer exception

2013-11-01 Thread Neha Narkhede
I think you are hitting https://issues.apache.org/jira/browse/KAFKA-824. Was the consumer being shutdown at that time? On Fri, Nov 1, 2013 at 1:28 AM, Kane Kane wrote: > I've got the following exception running produce/consume loop for several > hours, > that was just single exception, but duri

Re: NotLeaderForPartitionException after creating a topic

2013-11-01 Thread Neha Narkhede
The following exception seems suspicious and I think it is fixed in 0.8 HEAD. Do you mind trying 0.8 HEAD and see if this can still be reproduced ? ava.util. NoSuchElementException: key not found: /disk1/test/kafka-data at scala.collection.MapLike$class.default(MapLike.scala:223) at scal

WARN Property log.cleanup.interval.mins is not valid

2013-11-01 Thread Viktor Kolodrevskiy
Hi guys, While running latest kafka trunk code I'm getting warning in server.log: [2013-11-01 15:06:11,419] WARN Property log.cleanup.interval.mins is not valid (kafka.utils.VerifiableProperties) Do I need to ignore this message? -- Thanks, Viktor

Kafka for *critical* data

2013-11-01 Thread Richard Rodseth
I'm excited about Kafka but want to be sure it is ready (or will be soon) for a critical data pipeline. What are the showstoppers, if any? eg. https://issues.apache.org/jira/browse/KAFKA-156 mentioned here: http://stackoverflow.com/questions/12130481/is-kafka-ready-for-production-use/12764663#1

Re: Purgatory

2013-11-01 Thread Joe Stein
Priya, if you want you can look at RequestPurgatory.scala for some more details. The config is the size of the atomic requestCounter. Basically the purge in the purgatory is a way to check if the request has been satisfied and delayed and can get removed. It is a background scan when the size re

Re: Kafka for *critical* data

2013-11-01 Thread Joe Stein
Richard, KAFKA-156 is something that the client could take care of. It is 2 years old and a bit of a hack suggestion for what Replication in 0.8.0 has become and I would argue we close it (we should go through all the ticket prior to 0.9 and chat about them, separate thread I will start on dev).

Re: uncontrolled shutdown

2013-11-01 Thread Kane Kane
Neha, yes when i kill it with -9, sure I will file a bug. Thanks. On Fri, Nov 1, 2013 at 3:43 AM, Neha Narkhede wrote: > When you say the Kafka dies, did you mean a kill -9 or kill -15 ? I agree > that Kafka should try and delete/rebuild index files, but in this case, it > sounds like a bug. Ca

Re: consumer null pointer exception

2013-11-01 Thread Kane Kane
Hello Neha, I think i might be hitting this. I didn't shutdown the consumer (at least intentionally). Basically it was just an attempt to pipe ~1T through kafka, i would wild guess it's related to log expansion. Because around the time it happened i saw messages about expanding log, is it possible

Re: SimpleConsumer cannot read KeyedMessage.

2013-11-01 Thread Jun Rao
Did you check the error code associated with each partition in the fetch response? Thanks, Jun On Thu, Oct 31, 2013 at 9:59 PM, Lu Xuechao wrote: > No. The simple consumer does receive some responses and can iterate the > loop: > > for (MessageAndOffset messageAndOffset : fetchResponse.messag

Re: WARN Property log.cleanup.interval.mins is not valid

2013-11-01 Thread Jun Rao
It seems in trunk log.cleanup.interval.mins is renamed to log.retention.check.interval.ms. We should put in the new name in config/server.properties. Could you file a jira? Thanks, Jun On Fri, Nov 1, 2013 at 6:10 AM, Viktor Kolodrevskiy < viktor.kolodrevs...@gmail.com> wrote: > Hi guys, > > Wh

Re: uncontrolled shutdown

2013-11-01 Thread Jun Rao
Are you using the latest code in the 0.8 branch? Thanks, Jun On Fri, Nov 1, 2013 at 7:36 AM, Kane Kane wrote: > Neha, yes when i kill it with -9, sure I will file a bug. > > Thanks. > > > On Fri, Nov 1, 2013 at 3:43 AM, Neha Narkhede >wrote: > > > When you say the Kafka dies, did you mean a

Re: uncontrolled shutdown

2013-11-01 Thread Kane Kane
Hello Jun Rao, not it's not the head, I've compiled it a couple of weeks ago. Should i try with latest? On Fri, Nov 1, 2013 at 7:58 AM, Jun Rao wrote: > Are you using the latest code in the 0.8 branch? > > Thanks, > > Jun > > > On Fri, Nov 1, 2013 at 7:36 AM, Kane Kane wrote: > > > Neha, yes w

Re: uncontrolled shutdown

2013-11-01 Thread Neha Narkhede
Jun, I've seen this happen in the latest code in trunk, need to remember how to reproduce it. Thanks, Neha On Nov 1, 2013 7:59 AM, "Jun Rao" wrote: > Are you using the latest code in the 0.8 branch? > > Thanks, > > Jun > > > On Fri, Nov 1, 2013 at 7:36 AM, Kane Kane wrote: > > > Neha, yes when

Re: uncontrolled shutdown

2013-11-01 Thread Kane Kane
Filed a bug meanwhile, https://issues.apache.org/jira/browse/KAFKA-1112 On Fri, Nov 1, 2013 at 8:15 AM, Neha Narkhede wrote: > Jun, > > I've seen this happen in the latest code in trunk, need to remember how to > reproduce it. > > Thanks, > Neha > On Nov 1, 2013 7:59 AM, "Jun Rao" wrote: > > >

Re: uncontrolled shutdown

2013-11-01 Thread Guozhang Wang
Currently the index files will only be deleted on startup if there are any .swap file indicating the server crashed while opening the log segments. We should probably change this logic. Guozhang On Fri, Nov 1, 2013 at 8:16 AM, Kane Kane wrote: > Filed a bug meanwhile, https://issues.apache.org

Re: Kafka for *critical* data

2013-11-01 Thread Olivier Pomel
Richard - I can't speak about 0.8x, but we've been using Kafka in a most critical capacity on production since the early days @ Datadog. And... - It never failed (!) - It scaled predictably - Its performance was consistent - The simplicity of its design made it really easy to reason about We

Re: consumer null pointer exception

2013-11-01 Thread Neha Narkhede
Could you send the entire consumer log? On Fri, Nov 1, 2013 at 7:45 AM, Kane Kane wrote: > Hello Neha, I think i might be hitting this. I didn't shutdown the consumer > (at least intentionally). Basically it was just an attempt to pipe ~1T > through kafka, i would wild guess it's related to log

Re: WARN Property log.cleanup.interval.mins is not valid

2013-11-01 Thread Viktor Kolodrevskiy
Done! https://issues.apache.org/jira/browse/KAFKA-1113 -- Thanks, Viktor 2013/11/1 Jun Rao : > It seems in trunk log.cleanup.interval.mins is renamed to > log.retention.check.interval.ms. We should put in the new name in > config/server.properties. Could you file a jira? > > Thanks, > > Jun > >

Re: Kafka for *critical* data

2013-11-01 Thread Surendranauth Hiraman
I'm a fan of kafka as well. We've been using 0.7.2 for about a year. I recommend it strongly. But I will point one thing. Not an issue with Kafka itself but when the client side has failed, choosing what offset to reset to is not an exact science. You will have to decide how much data loss or dat

Re: Ganglia Metrics Reporter

2013-11-01 Thread Andrew Otto
Hmm, It looks as though adding multicast support is more an issue with the Metrics code rather than with kafka-ganglia. The only way I could see to tell GangliaReporter to use multicast when sending messages was to pass it a GangliaMessageBuilder that constructed GangliaMessages using a proper

Re: consumer null pointer exception

2013-11-01 Thread Kane Kane
Hello Neha, rest of the log goes to my consumer code, are you interested in? It's a little bit modified version of ConsoleConsumer. Thanks. On Fri, Nov 1, 2013 at 9:24 AM, Neha Narkhede wrote: > Could you send the entire consumer log? > > > On Fri, Nov 1, 2013 at 7:45 AM, Kane Kane wrote: > >

Re: SimpleConsumer cannot read KeyedMessage.

2013-11-01 Thread Lu Xuechao
checked fetchResponse.hasError() but has no error. On Fri, Nov 1, 2013 at 7:45 AM, Jun Rao wrote: > Did you check the error code associated with each partition in the fetch > response? > > Thanks, > > Jun > > > On Thu, Oct 31, 2013 at 9:59 PM, Lu Xuechao wrote: > > > No. The simple consumer do

Re: Topic creation on restart

2013-11-01 Thread Jason Rosenberg
Neha, This cluster has on the order of 750 topics. It looks like if I add a 20 second delay before placing a broker into the vip for metadata requests, it never seems to have this issue. So I'm not sure about the 104 seconds number, other than that was how long the flood of "Topic creation" log

Re: Purgatory

2013-11-01 Thread Marc Labbe
Guozhang, I have to agree with Priya the doc isn't very clear. Although the configuration is documented, it is simply rewording the name of the config, which isn't particularly useful if you want more information about what the purgatory is. I searched the whole wiki and doc and could not find any

Incorrect JMX MBean name on Kafka doc page

2013-11-01 Thread Andrew Otto
In http://kafka.apache.org/documentation.html#monitoring, ISR expansion rate "kafka.server":name="ISRShrinksPerSec",type="ReplicaManager"See above I believe this should be "kafka.server":name="IsrExpandsPerSec",type="ReplicaManager" -Andrew Otto

Re: Purgatory

2013-11-01 Thread Joel Koshy
Marc, thanks for writing that up. I think it is worth adding some details on the request-purgatory on a wiki (Jay had started a wiki page for kafka internals [1] a while ago, but we have not had time to add much to it since.) Your write-up could be reviewed and added there. Do you have edit permiss

Re: Controlled shutdown failure, retry settings

2013-11-01 Thread Joel Koshy
Unclean shutdown could result in data loss - since you are moving leadership to a replica that has fallen out of ISR. i.e., it's log end offset is behind the last committed message to this partition. >>> But if data is written with 'request.required.acks=-1', no data s

Re: Incorrect JMX MBean name on Kafka doc page

2013-11-01 Thread Neha Narkhede
Thanks for catching that. Pushed the fix. -Neha On Fri, Nov 1, 2013 at 12:58 PM, Andrew Otto wrote: > In http://kafka.apache.org/documentation.html#monitoring, > > ISR expansion rate > "kafka.server":name="ISRShrinksPerSec",type="ReplicaManager"See above > > > I believe this should be > >

Re: How to run a single unit test with ./sbt

2013-11-01 Thread Joel Koshy
Sorry no clue about that - anyone else know? On Mon, Oct 28, 2013 at 10:41 AM, Roger Hoover wrote: > Joel, > > Thank you! This is very helpful. > > What I notice now is that it works for Test classes that > extend org.scalatest.junit.JUnit3Suite. There are other tests in the > codebase that us

Re: Topic creation on restart

2013-11-01 Thread Neha Narkhede
The mbeans are explained here - http://kafka.apache.org/documentation.html#monitoring. Look for *QueueTimeMs Thanks, Neha On Fri, Nov 1, 2013 at 12:14 PM, Jason Rosenberg wrote: > Neha, > > This cluster has on the order of 750 topics. > > It looks like if I add a 20 second delay before placing

Re: Controlled shutdown failure, retry settings

2013-11-01 Thread Neha Narkhede
For supporting more durability at the expense of availability, we have a JIRA that we will fix on trunk. This will allow you to configure the default as well as per topic durability vs availability behavior - https://issues.apache.org/jira/browse/KAFKA-1028 Thanks, Neha On Fri, Nov 1, 2013 at 1

Re: uncontrolled shutdown

2013-11-01 Thread Guozhang Wang
Hello Kane, do you mind attach your index file causing the issue if you still have it? Guozhang On Fri, Nov 1, 2013 at 8:42 AM, Guozhang Wang wrote: > Currently the index files will only be deleted on startup if there are any > .swap file indicating the server crashed while opening the log seg

Re: uncontrolled shutdown

2013-11-01 Thread Kane Kane
Hello, yes, I can easily reproduce it, will send it to you asap. Thanks. On Fri, Nov 1, 2013 at 2:05 PM, Guozhang Wang wrote: > Hello Kane, do you mind attach your index file causing the issue if you > still have it? > > Guozhang > > > On Fri, Nov 1, 2013 at 8:42 AM, Guozhang Wang wrote: > >

Re: SimpleConsumer cannot read KeyedMessage.

2013-11-01 Thread Jun Rao
Which offset did you use for fetching? Is there data in the kafka log dir? Thanks, Jun On Fri, Nov 1, 2013 at 11:48 AM, Lu Xuechao wrote: > checked fetchResponse.hasError() but has no error. > > > On Fri, Nov 1, 2013 at 7:45 AM, Jun Rao wrote: > > > Did you check the error code associated wi

compiling with 2.10

2013-11-01 Thread Kane Kane
I think addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8") Should be updated to 0.9.0 at least to successfully compile. I've had an issue with assembly-package-dependency.

Re: SimpleConsumer cannot read KeyedMessage.

2013-11-01 Thread Lu Xuechao
The consumer starts from offset 0. Yes, in the log dir. On Fri, Nov 1, 2013 at 4:06 PM, Jun Rao wrote: > Which offset did you use for fetching? Is there data in the kafka log dir? > > Thanks, > > Jun > > > On Fri, Nov 1, 2013 at 11:48 AM, Lu Xuechao wrote: > > > checked fetchResponse.hasError(

Re: Purgatory

2013-11-01 Thread Marc Labbe
Hi Joel, I used to have edit to the wiki, I made a few additions to it a while ago but it's seem I don't have it anymore. It might have been lost in the confluence update. I would be glad to add what I have written if I get it back. Otherwise, feel free to paste my words in one of the pages, I don

Re: consumer null pointer exception

2013-11-01 Thread Neha Narkhede
Yes, could you send around the log4j file of the consumer around the time of the error in question. Thanks, Neha On Fri, Nov 1, 2013 at 11:35 AM, Kane Kane wrote: > Hello Neha, rest of the log goes to my consumer code, are you interested > in? It's a little bit modified version of ConsoleConsu

Re: Purgatory

2013-11-01 Thread Joe Stein
To edit the Wiki you need to send an ICLA http://www.apache.org/licenses/#clas to Apache and then once that is done an email to priv...@kafka.apache.org (or to me and I will copy private) with your Wiki username and that you sent the ICLA to Apache. Then, I can add you to edit the Wiki. /

Re: compiling with 2.10

2013-11-01 Thread Jun Rao
Does the problem exist with trunk? If so, could you open a jira and submit a patch? Thanks, Jun On Fri, Nov 1, 2013 at 4:14 PM, Kane Kane wrote: > I think > addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8") > > Should be updated to 0.9.0 at least to successfully compile. I've had > an i

Re: SimpleConsumer cannot read KeyedMessage.

2013-11-01 Thread Jun Rao
Did you make sure the fetch size in the fetch request is larger than the size of a single message? Thanks, Jun On Fri, Nov 1, 2013 at 5:07 PM, Lu Xuechao wrote: > The consumer starts from offset 0. Yes, in the log dir. > > > On Fri, Nov 1, 2013 at 4:06 PM, Jun Rao wrote: > > > Which offset d

Re: compiling with 2.10

2013-11-01 Thread Kane Kane
Yes, I've had problem, which resolved with updating sbt-assembly. Will open a ticket and provide a patch. On Fri, Nov 1, 2013 at 8:43 PM, Jun Rao wrote: > Does the problem exist with trunk? If so, could you open a jira and submit > a patch? > > Thanks, > > Jun > > > On Fri, Nov 1, 2013 at 4:14 PM

too many open files - broker died

2013-11-01 Thread Kane Kane
I had only 1 topic with 45 partitions replicated across 3 brokers. After several hours of uploading some data to kafka 1 broker died with the following exception. I guess i can fix it raising limit for open files, but I wonder how it happened under described circumstances. [2013-11-02 00:19:14,86

Re: Controlled shutdown failure, retry settings

2013-11-01 Thread Jason Rosenberg
In response to Joel's point, I think I do understand that messages can be lost, if in fact we have dropped down to only 1 member in the ISR at the time the message is written, and then that 1 node goes down. What I'm not clear on, is the conditions under which a node can drop out of the ISR. You

Re: consumer null pointer exception

2013-11-01 Thread Kane Kane
Unfortunately I've recompiled and installed kafka from scratch today, deleting all data. I'm still doing these tests, will send you stacktrace if I would be able to repro it. Thanks for help! On Fri, Nov 1, 2013 at 6:13 PM, Neha Narkhede wrote: > Yes, could you send around the log4j file of the