Re: Question about recovering from outage
I'm running Kafka 0.8. / Jonas 2014-05-18 23:45 GMT+02:00 Jonas Bergström : > Hi all, and thanks for a fantastic product. > > The other day our kafka node in our test environment went down due to full > disc. I reconfigured kafka to save fewer messages, and restarted the node. > It is a single node setup. At restart the node freed up some disc space, > but no new messages where accepted. In the log we saw this: > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from > client on partition [logs,0] failed due to Partition [logs,0] doesn't > exist on 0 (kafka.server.KafkaApis) > > List-topics showed: > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0 > > which seemed fine, but I figured I might have to reassign the topic > partition anyway, so I did. Nothing seemed to happen, neither in the logs > or in the status. Then I got another thing to take care of for awhile, and > realized about 30 minutes later that the node started working again! > > Is this expected behavior? How long does a node take to "get online" again > after a crash-restart? Is there a way to tell that the node is on it's way > up? > > > Thanks / Jonas >
Re: ISR not updating
The value of under replicated partitions is 0 across the cluster. Thanks, Shone On Mon, May 19, 2014 at 12:23 AM, Jun Rao wrote: > What's the value of under replicated partitions JMX in each broker? > > Thanks, > > Jun > > > On Sat, May 17, 2014 at 6:16 PM, Paul Mackles wrote: > > > Today we did a rolling restart of ZK. We also restarted the kafka > > controller and ISRs are still not being updated in ZK. Again, the cluster > > seems fine and the replicas in question do appear to be getting updated. > I > > am guessing there must be some bad state persisted in ZK. > > > > On 5/17/14 7:50 PM, "Shone Sadler" wrote: > > > > >Hi Jun, > > > > > >I work with Paul and am monitoring the cluster as well. The status has > > >not changed. > > > > > >When we execute kafka-list-topic we are seeing the following (showing > one > > >of two partitions having the problem) > > > > > >topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1 > > > > > >when inspecting the logs of leader: I do see a spurt of ISR > > >shrinkage/expansion around the time that the brokers were partitioned > > >from > > >ZK. But nothing past the last message "Cached zkVersion [17] not equal > to > > >that in zookeeper." from yesterday. There are not constant changes to > > >the > > >ISR list. > > > > > >Is there a way to force the leader to update ZK with the latest ISR > list? > > > > > >Thanks, > > >Shone > > > > > >Logs: > > > > > >cat server.log | grep "\[t1,33\]" > > > > > >[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1] > > >Removing > > >fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager) > > >[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing fetch > > >request for partition [t1,33] offset 330118156 from consumer with > > >correlation id 0 (kafka.server.KafkaApis) > > >[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing fetch > > >request for partition [t1,33] offset 332896470 from consumer with > > >correlation id 0 (kafka.server.KafkaApis) > > >[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1: Shrinking > > >ISR > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition) > > >[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR > > >(kafka.cluster.Partition) > > >[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1: Shrinking > > >ISR > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition) > > >[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR > > >(kafka.cluster.Partition) > > > > > > > > >On Sat, May 17, 2014 at 11:44 AM, Jun Rao wrote: > > > > > >> Do you see constant ISR shrinking/expansion of those two partitions in > > >>the > > >> leader broker's log ? > > >> > > >> Thanks, > > >> > > >> Jun > > >> > > >> > > >> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles > > >>wrote: > > >> > > >> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little behind > in > > >> > upgrading). > > >> > > > >> > From what I can tell, connectivity to ZK was lost for a brief > period. > > >>The > > >> > cluster seemed to recover OK except that we now have 2 (out of 125) > > >> > partitions where the ISR appears to be out of date. In other words, > > >> > kafka-list-topic is showing only one replica in the ISR for the 2 > > >> > partitions in question (there should be 3). > > >> > > > >> > What's odd is that in looking at the log segments for those > > >>partitions on > > >> > the file system, I can see that they are in fact getting updated and > > >>by > > >> all > > >> > measures look to be in sync. I can also see that the brokers where > the > > >> > out-of-sync replicas reside are doing fine and leading other > > >>partitions > > >> > like nothing ever happened. Based on that, it seems like the ISR in > > >>ZK is > > >> > just out-of-date due to a botched recovery from the brief ZK outage. > > >> > > > >> > Has anyone seen anything like this before? I saw this ticket which > > >> sounded > > >> > similar: > > >> > > > >> > https://issues.apache.org/jira/browse/KAFKA-948 > > >> > > > >> > Anyone have any suggestions for recovering from this state? I was > > >> thinking > > >> > of running the preferred-replica-election tool next to see if that > > >>gets > > >> the > > >> > ISRs in ZK back in sync. > > >> > > > >> > After that, I guess the next step would be to bounce the kafka > > >>servers in > > >> > question. > > >> > > > >> > Thanks, > > >> > Paul > > >> > > > >> > > > >> > > > > >
RE: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka
Hi, I like how kafka operates, but I'm wondering if it is possible to run everything on a single ec2 instance with 7.5 GB RAM. So that would be zookeeper and a single kafka broker. I would have a separate server to consume from the broker. Producers would be from my web servers. I don't want to complicate things as i don't really need failover or redundancy etc. I just want to keep things simple. I'll have a single topic, and a few partitions because I want the guarantee that the messages are in order. Is this something that would be really out of the norm and not recommended? i.e. nobody really uses it this way and who knows what is going to happen? :)
Make kafka storage engine pluggable and provide a HDFS plugin?
Hi there, I recently started to use Kafka for our data analysis pipeline and it works very well. One problem to us so far is expanding our cluster when we need more storage space. Kafka provides some scripts for helping do this but the process wasn't smooth. To make it work perfectly, seems Kafka needs to do some jobs that a distributed file system has already done. So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe make the Kafka storage engine pluggable and HDFS is one option? The pros might be that HDFS has already handled storage management (replication, corrupted disk/machine, migration, load balance, etc.) very well and it frees Kafka and the users from the burden, and the cons might be performance degradation. As Kafka does very well on performance, possibly even with some degree of degradation, it's still competitive for the most situations. Best, -- Hangjun Ye
Re: ISR not updating
Ok. That does indicate the ISR should include all replicas. Which version of ZK server are you using? Could you check the ZK server log to see if there if the ISR is being updated? Thanks, Jun On Mon, May 19, 2014 at 1:30 AM, Shone Sadler wrote: > The value of under replicated partitions is 0 across the cluster. > > Thanks, > Shone > > > On Mon, May 19, 2014 at 12:23 AM, Jun Rao wrote: > > > What's the value of under replicated partitions JMX in each broker? > > > > Thanks, > > > > Jun > > > > > > On Sat, May 17, 2014 at 6:16 PM, Paul Mackles > wrote: > > > > > Today we did a rolling restart of ZK. We also restarted the kafka > > > controller and ISRs are still not being updated in ZK. Again, the > cluster > > > seems fine and the replicas in question do appear to be getting > updated. > > I > > > am guessing there must be some bad state persisted in ZK. > > > > > > On 5/17/14 7:50 PM, "Shone Sadler" wrote: > > > > > > >Hi Jun, > > > > > > > >I work with Paul and am monitoring the cluster as well. The status > has > > > >not changed. > > > > > > > >When we execute kafka-list-topic we are seeing the following (showing > > one > > > >of two partitions having the problem) > > > > > > > >topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1 > > > > > > > >when inspecting the logs of leader: I do see a spurt of ISR > > > >shrinkage/expansion around the time that the brokers were partitioned > > > >from > > > >ZK. But nothing past the last message "Cached zkVersion [17] not equal > > to > > > >that in zookeeper." from yesterday. There are not constant changes > to > > > >the > > > >ISR list. > > > > > > > >Is there a way to force the leader to update ZK with the latest ISR > > list? > > > > > > > >Thanks, > > > >Shone > > > > > > > >Logs: > > > > > > > >cat server.log | grep "\[t1,33\]" > > > > > > > >[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1] > > > >Removing > > > >fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager) > > > >[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing > fetch > > > >request for partition [t1,33] offset 330118156 from consumer with > > > >correlation id 0 (kafka.server.KafkaApis) > > > >[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing > fetch > > > >request for partition [t1,33] offset 332896470 from consumer with > > > >correlation id 0 (kafka.server.KafkaApis) > > > >[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1: > Shrinking > > > >ISR > > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition) > > > >[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached > > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR > > > >(kafka.cluster.Partition) > > > >[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1: > Shrinking > > > >ISR > > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition) > > > >[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached > > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR > > > >(kafka.cluster.Partition) > > > > > > > > > > > >On Sat, May 17, 2014 at 11:44 AM, Jun Rao wrote: > > > > > > > >> Do you see constant ISR shrinking/expansion of those two partitions > in > > > >>the > > > >> leader broker's log ? > > > >> > > > >> Thanks, > > > >> > > > >> Jun > > > >> > > > >> > > > >> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles > > > >>wrote: > > > >> > > > >> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little > behind > > in > > > >> > upgrading). > > > >> > > > > >> > From what I can tell, connectivity to ZK was lost for a brief > > period. > > > >>The > > > >> > cluster seemed to recover OK except that we now have 2 (out of > 125) > > > >> > partitions where the ISR appears to be out of date. In other > words, > > > >> > kafka-list-topic is showing only one replica in the ISR for the 2 > > > >> > partitions in question (there should be 3). > > > >> > > > > >> > What's odd is that in looking at the log segments for those > > > >>partitions on > > > >> > the file system, I can see that they are in fact getting updated > and > > > >>by > > > >> all > > > >> > measures look to be in sync. I can also see that the brokers where > > the > > > >> > out-of-sync replicas reside are doing fine and leading other > > > >>partitions > > > >> > like nothing ever happened. Based on that, it seems like the ISR > in > > > >>ZK is > > > >> > just out-of-date due to a botched recovery from the brief ZK > outage. > > > >> > > > > >> > Has anyone seen anything like this before? I saw this ticket which > > > >> sounded > > > >> > similar: > > > >> > > > > >> > https://issues.apache.org/jira/browse/KAFKA-948 > > > >> > > > > >> > Anyone have any suggestions for recovering from this state? I was > > > >> thinking > > > >> > of running the preferred-replica-election tool next to see if that > > > >>gets > > > >> the > > > >> > ISRs in ZK back in sync. > > > >> > > > > >> > After that
Re: Question about recovering from outage
Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner cases in the controller. Thanks, Jun On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström wrote: > I'm running Kafka 0.8. > > / Jonas > > > 2014-05-18 23:45 GMT+02:00 Jonas Bergström : > > > Hi all, and thanks for a fantastic product. > > > > The other day our kafka node in our test environment went down due to > full > > disc. I reconfigured kafka to save fewer messages, and restarted the > node. > > It is a single node setup. At restart the node freed up some disc space, > > but no new messages where accepted. In the log we saw this: > > > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from > > client on partition [logs,0] failed due to Partition [logs,0] doesn't > > exist on 0 (kafka.server.KafkaApis) > > > > List-topics showed: > > > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0 > > > > which seemed fine, but I figured I might have to reassign the topic > > partition anyway, so I did. Nothing seemed to happen, neither in the logs > > or in the status. Then I got another thing to take care of for awhile, > and > > realized about 30 minutes later that the node started working again! > > > > Is this expected behavior? How long does a node take to "get online" > again > > after a crash-restart? Is there a way to tell that the node is on it's > way > > up? > > > > > > Thanks / Jonas > > >
Kafka Migration Tool
Hi all, I'm trying to migrate from Kafka 0.7.2-2.9.2 (with Zookeeper 3.3.4 from Cloudera) to Kafka 0.8.1.1-2.9.2 (with official Zookeeper 3.4.5 ) - However hitting brick walls with a very mysterious problem: 6) at kafka.tools.KafkaMigrationTool.main(KafkaMigrationTool.java:217) Caused by: java.lang.NumberFormatException: For input string: ""1400511498394","host"" I've attached logs, scripts and configs from everything that I'm trying to run. FYI: We have three servers for Kafka Brokers and three servers for Zookeeper. They are running on Staging (stag) as number s04, s05 and s06. I've only given the properties for s04 as the other two are almost identical. Thanks, Cheers, Mo. start-migration.sh Description: Bourne shell script
Re: What happens to Kafka when ZK lost its quorum?
Hi Weide/Connie, I have added this entry in the FAQ, please let me know if anything on the wiki is not clear to you. https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowareKafkabrokersdependonZookeeper ? Guozhang On Wed, May 14, 2014 at 8:15 AM, Neha Narkhede wrote: > Kafka requires a functional and healthy zookeeper setup. It is recommended > that you closely monitor your zookeeper cluster and provision it so that it > is performant. > > Thanks, > Neha > > > On Tue, May 13, 2014 at 6:52 AM, Connie Yang > wrote: > > > Hi all, > > > > Can Kafka producers, brokers and consumers still be processing messages > and > > functioning in their normal states if Zookeeper lost its quorum? > > > > Thanks, > > Connie > > > -- -- Guozhang
Cause and Recovery
Hi, I got this from the log: [exec] 03:50:06.216 [ProducerSendThread-] [1;31mERROR [0;39m [1;35mk.producer.async.ProducerSendThread [0;39m - Error in handling batch of 100 events [exec] kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries. [exec] at kafka.producer.async.DefaultEventHandler.handle(Unknown Source) ~[kafka_2.10-0.8.0.jar:0.8.0] [exec] at kafka.producer.async.ProducerSendThread.tryToHandle(Unknown Source) [kafka_2.10-0.8.0.jar:0.8.0] [exec] at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown Source) ~[kafka_2.10-0.8.0.jar:0.8.0] [exec] at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown Source) ~[kafka_2.10-0.8.0.jar:0.8.0] [exec] at scala.collection.immutable.Stream.foreach(Stream.scala:547) ~[scala-library-2.10.1.jar:na] [exec] at kafka.producer.async.ProducerSendThread.processEvents(Unknown Source) [kafka_2.10-0.8.0.jar:0.8.0] [exec] at kafka.producer.async.ProducerSendThread.run(Unknown Source) [kafka_2.10-0.8.0.jar:0.8.0] My question is: 1. What could cause this? 2. Who should deal with the recovery? User or Kafka? Let me know if more log is needed. (It hangs the integration test, so I have 3.5G of them ... from a maven build) -- Best Regards, Mingtao Zhang
Re: Cause and Recovery
Hi Mingtao, 1. Do you see any error/warn log before this entry? 2. The producer will re-try on sending when the previous trial is not successful. In your case it has exhausted all the retries and hence messages are dropped on this floor and lost. Guozhang On Mon, May 19, 2014 at 8:32 AM, Mingtao Zhang wrote: > Hi, > > I got this from the log: > > [exec] 03:50:06.216 [ProducerSendThread-] [1;31mERROR [0;39m > [1;35mk.producer.async.ProducerSendThread [0;39m - Error in handling batch > of 100 events > [exec] kafka.common.FailedToSendMessageException: Failed to send > messages after 3 tries. > [exec] at kafka.producer.async.DefaultEventHandler.handle(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > [exec] at kafka.producer.async.ProducerSendThread.tryToHandle(Unknown > Source) [kafka_2.10-0.8.0.jar:0.8.0] > [exec] at > > kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > [exec] at > > kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > [exec] at scala.collection.immutable.Stream.foreach(Stream.scala:547) > ~[scala-library-2.10.1.jar:na] > [exec] at > kafka.producer.async.ProducerSendThread.processEvents(Unknown Source) > [kafka_2.10-0.8.0.jar:0.8.0] > [exec] at kafka.producer.async.ProducerSendThread.run(Unknown Source) > [kafka_2.10-0.8.0.jar:0.8.0] > > My question is: > > 1. What could cause this? > 2. Who should deal with the recovery? User or Kafka? > > Let me know if more log is needed. (It hangs the integration test, so I > have 3.5G of them ... from a maven build) > > -- > > Best Regards, > Mingtao Zhang > -- -- Guozhang
Re: Question about recovering from outage
Ok, I'll upgrade. Is there a way to see the status of a node that is recovering, e.g. in zookeeper or via jmx? / Jonas 2014-05-19 16:49 GMT+02:00 Jun Rao : > Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner cases > in the controller. > > Thanks, > > Jun > > > On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström >wrote: > > > I'm running Kafka 0.8. > > > > / Jonas > > > > > > 2014-05-18 23:45 GMT+02:00 Jonas Bergström : > > > > > Hi all, and thanks for a fantastic product. > > > > > > The other day our kafka node in our test environment went down due to > > full > > > disc. I reconfigured kafka to save fewer messages, and restarted the > > node. > > > It is a single node setup. At restart the node freed up some disc > space, > > > but no new messages where accepted. In the log we saw this: > > > > > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from > > > client on partition [logs,0] failed due to Partition [logs,0] doesn't > > > exist on 0 (kafka.server.KafkaApis) > > > > > > List-topics showed: > > > > > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0 > > > > > > which seemed fine, but I figured I might have to reassign the topic > > > partition anyway, so I did. Nothing seemed to happen, neither in the > logs > > > or in the status. Then I got another thing to take care of for awhile, > > and > > > realized about 30 minutes later that the node started working again! > > > > > > Is this expected behavior? How long does a node take to "get online" > > again > > > after a crash-restart? Is there a way to tell that the node is on it's > > way > > > up? > > > > > > > > > Thanks / Jonas > > > > > >
Consistent replication of an event stream into Kafka
Hello, We have a use case where we want to replicate an event stream which exists outside of kafka into a kafka topic (single partition). The event stream has sequence ids which always increase by 1. We want to preserve this ordering. The difficulty is that we want to be able to have the process that writes these events automatically fail-over if it dies. While ZooKeeper can guarantee a single writer at a given point in time we are worried about delayed network packets, bugs and long GC pauses. One solution we've thought of is to set the sequence_id as the key for the Kafka messages and have a proxy running on each Kafka broker which refuses to write new messages if they don't have the next expected key. This seems to solve any issue we would have with badly behaving networks or processes. Is there a better solution? Should we just handle these inconsistencies in our consumers? Are we being too paranoid? As a side-note, it seems like this functionality (guaranteeing that all keys in a partition are in sequence on a particular topic) may be a nice option to have in Kafka proper. Thanks, Bob
SocketServerStats not reporting bytes written or read
Hi all, I have an intermittent problem with the JMX SocketServer stats on my 0.7.2 Kafka cluster. I'm collecting the SocketServerStats with jmxstats and everything seems to be working fine except kafka.SocketServerStats:BytesWrittenPerSecond and kafka.SocketServerStats:BytesReadPerSecond are not working all the time. It sometimes will cut out and not report any traffic and then it will randomly report back normal stats. I've noticed that when I started a new topic and starting sending data with a new producer, the stats for bytes written and read will suddenly zero out. Funny thing is that the other stats seem to still be working fine including cumulative bytes read and written. Does anyone know what might be causing this and how I can fix it? Thanks, Xuyen
Re: Kafka Migration Tool
It seems that you may have set the zk connection string to one used by 0.8 Kafka brokers. Thanks, Jun On Mon, May 19, 2014 at 8:34 AM, Mo Firouz wrote: > Hi all, > > I'm trying to migrate from Kafka 0.7.2-2.9.2 (with Zookeeper 3.3.4 from > Cloudera) to Kafka 0.8.1.1-2.9.2 (with official Zookeeper 3.4.5 ) - However > hitting brick walls with a very mysterious problem: > > 6) at kafka.tools.KafkaMigrationTool.main(KafkaMigrationTool.java:217) > Caused by: java.lang.NumberFormatException: For input string: > ""1400511498394","host"" > > I've attached logs, scripts and configs from everything that I'm trying to > run. > > FYI: We have three servers for Kafka Brokers and three servers for > Zookeeper. They are running on Staging (stag) as number s04, s05 and s06. > I've only given the properties for s04 as the other two are almost > identical. > > Thanks, > Cheers, > Mo. > > >
Re: Question about recovering from outage
In trunk, we have a JMX monitoring the states of each broker. One of the states is log recovery. Thanks, Jun On Mon, May 19, 2014 at 11:15 AM, Jonas Bergström wrote: > Ok, I'll upgrade. > Is there a way to see the status of a node that is recovering, e.g. in > zookeeper or via jmx? > > / Jonas > > > 2014-05-19 16:49 GMT+02:00 Jun Rao : > > > Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner > cases > > in the controller. > > > > Thanks, > > > > Jun > > > > > > On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström > >wrote: > > > > > I'm running Kafka 0.8. > > > > > > / Jonas > > > > > > > > > 2014-05-18 23:45 GMT+02:00 Jonas Bergström : > > > > > > > Hi all, and thanks for a fantastic product. > > > > > > > > The other day our kafka node in our test environment went down due to > > > full > > > > disc. I reconfigured kafka to save fewer messages, and restarted the > > > node. > > > > It is a single node setup. At restart the node freed up some disc > > space, > > > > but no new messages where accepted. In the log we saw this: > > > > > > > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from > > > > client on partition [logs,0] failed due to Partition [logs,0] > doesn't > > > > exist on 0 (kafka.server.KafkaApis) > > > > > > > > List-topics showed: > > > > > > > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0 > > > > > > > > which seemed fine, but I figured I might have to reassign the topic > > > > partition anyway, so I did. Nothing seemed to happen, neither in the > > logs > > > > or in the status. Then I got another thing to take care of for > awhile, > > > and > > > > realized about 30 minutes later that the node started working again! > > > > > > > > Is this expected behavior? How long does a node take to "get online" > > > again > > > > after a crash-restart? Is there a way to tell that the node is on > it's > > > way > > > > up? > > > > > > > > > > > > Thanks / Jonas > > > > > > > > > >
Re: SocketServerStats not reporting bytes written or read
Is the problem with the jmx beans themselves or jmxstats? Thanks, Jun On Mon, May 19, 2014 at 2:48 PM, Xuyen On wrote: > Hi all, > > I have an intermittent problem with the JMX SocketServer stats on my 0.7.2 > Kafka cluster. > I'm collecting the SocketServerStats with jmxstats and everything seems to > be working fine except kafka.SocketServerStats:BytesWrittenPerSecond and > kafka.SocketServerStats:BytesReadPerSecond are not working all the time. It > sometimes will cut out and not report any traffic and then it will randomly > report back normal stats. I've noticed that when I started a new topic and > starting sending data with a new producer, the stats for bytes written and > read will suddenly zero out. Funny thing is that the other stats seem to > still be working fine including cumulative bytes read and written. > > Does anyone know what might be causing this and how I can fix it? > > Thanks, > > Xuyen > > > >
Re: Consistent replication of an event stream into Kafka
Hello Bob, What you described is similar to the idempotent producer design that we are now discussing about: https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer Do you think this new feature will solve your case? Guozhang On Mon, May 19, 2014 at 2:40 PM, Bob Potter wrote: > Hello, > > We have a use case where we want to replicate an event stream which exists > outside of kafka into a kafka topic (single partition). The event stream > has sequence ids which always increase by 1. We want to preserve this > ordering. > > The difficulty is that we want to be able to have the process that writes > these events automatically fail-over if it dies. While ZooKeeper can > guarantee a single writer at a given point in time we are worried about > delayed network packets, bugs and long GC pauses. > > One solution we've thought of is to set the sequence_id as the key for the > Kafka messages and have a proxy running on each Kafka broker which refuses > to write new messages if they don't have the next expected key. This seems > to solve any issue we would have with badly behaving networks or processes. > > Is there a better solution? Should we just handle these inconsistencies in > our consumers? Are we being too paranoid? > > As a side-note, it seems like this functionality (guaranteeing that all > keys in a partition are in sequence on a particular topic) may be a nice > option to have in Kafka proper. > > Thanks, > Bob > -- -- Guozhang