Re: 0.7.3?
Looking at https://issues.apache.org/jira/browse/KAFKA#selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel I didn't see a 0.7.3 so I created one. Jira also thinks 0.7.2 is still un-released with KAFKA-411 open, but I'm not sure where that should belong. On 05/23/2013 05:25 PM, Neha Narkhede wrote: > Do you mind filing a JIRA for this ? Feel free to upload a patch. > > Thanks, > Neha > > > On Thu, May 23, 2013 at 12:25 PM, Chris Burroughs > wrote: > >> Did this ever get traction? >> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=summary shows >> activity in the 0.7 branch but the last tag as >> kafka-0.7.2-incubating-candidate-5 >> >> >> http://mail-archives.apache.org/mod_mbox/kafka-dev/201302.mbox/%3cce36b916-a8b9-40c3-8a0e-958397c17...@gmail.com%3E >> >
[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673182#comment-13673182 ] Jun Rao commented on KAFKA-927: --- Thanks for patch v2. A few more comments: 20. KafkaController: If when shutdownBroker is called, the controller is no longer active, both state machines will throw an exception on state change calls. However, the issue is that we add the shutdown broker to controllerContext.shuttingDownBrokerIds and it's never reset. This may become a problem if this broker becomes a controller again. At the minimum, we need to reset controllerContext.shuttingDownBrokerIds in onControllerFailover(). However, I am a bit confused why we never reset controllerContext.shuttingDownBrokerIds and the shutdown logic still works. 21. ControlledShutdownRequest.handleError(): We should probably set partitionsRemaining in ControlledShutdownResponse to empty instead of null, since the serialization of ControlledShutdownResponse doesn't handle partitionsRemaining being null. 22. testRollingBounce: 22.1 The test makes sure that the leader for topic1 is changed after broker 0 is shutdown. However, the leader for topic1 could be on broker 1 initially. In this case, the leader won't be changed after broker 0 is shutdown. 22.2 The default controlledShutdownRetryBackoffMs is 5secs, which is probably too long for the unit test. 23. KafkaServer: We need to handle the errorCode in ControlledShutdownResponse since the controller may have moved after we send the ControlledShutdown request. >From the previous review: 3. I think a simple solution is to (1) not call replicaManager.replicaFetcherManager.closeAllFetchers() in KafkaServer during shutdown; (2) in KafkaController.shutdownBroker(), for each partition on the shutdown broker, we first send a stopReplicaRequest to it for that partition before going through the state machine logic. Since the state machine logic involves ZK reads/writes, it's very likely that the stopReplicaRequest will reach the broker before the subsequent LeaderAndIsr requests. So, in most cases, the leader should be able to shrink ISR quicker than the timeout, without churns in ISR. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-927: - Attachment: KAFKA-927-v2-revised.patch Realized my previous patch did not have my latest changes just the new files. 20. shuttingDownBrokerIds does get updated on broker failure 21 done 22.1 i had already fixed this. The new patch should have the change 23. This is also handled in the new patch 3. That sounds reasonable among all the hacky fixes. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (KAFKA-897) NullPointerException in ConsoleConsumer
[ https://issues.apache.org/jira/browse/KAFKA-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin B. closed KAFKA-897. -- > NullPointerException in ConsoleConsumer > --- > > Key: KAFKA-897 > URL: https://issues.apache.org/jira/browse/KAFKA-897 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.8 >Reporter: Colin B. >Assignee: Neha Narkhede >Priority: Minor > Fix For: 0.8.1 > > Attachments: Kafka897-v1.patch, KAFKA-897-v2.patch > > > The protocol document [1] mentions that keys and values in message sets can > be null. However the ConsoleConsumer throws a NPE when a null is passed for > the value. > java.lang.NullPointerException > at kafka.utils.Utils$.readBytes(Utils.scala:141) > at > kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:106) > at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33) > at > kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:61) > at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:53) > at scala.collection.Iterator$class.foreach(Iterator.scala:631) > at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) > at kafka.consumer.KafkaStream.foreach(KafkaStream.scala:25) > at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:195) > at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala) > [1] > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673380#comment-13673380 ] Neha Narkhede commented on KAFKA-927: - Thanks for the revised v2 patch. Few more comments - 1. KafkaServer 1.1 startupComplete should either be a volatile variable to AtomicBoolean. Two different threads call startup() and controlledShutdown(), which modify startupComplete. 1.2 In controlledShutdown(), we need to handle error codes in ControlledShutdownResponse explicitly. It can happen that the error code is set and partitionsRemaining are 0, which will lead to errors. 2. Partition >From previous review #4, if the broker has to ignore the become follower >request anyway, does it make sense to even process part of it and truncate log >etc ? 3. From previous review #3, I meant that it is pointless to do the ZK write on the controller since right after the write, since the follower hasn't received the stop replica request and the leader hasn't received shrunk isr, the broker being shut down will get added back to ISR. You can verify that this happens from the logs. It also makes controlled shutdown very slow since typically in production we move ~1000 partitions from the broker and zk writes can take ~20ms which means several seconds wasted just doing the ZK writes. Instead, it is enough to let the leader shrink the isr by sending it the leader and isr request. On the other hand, we can argue that the OfflineReplica state change itself should be changed to avoid the ZK write. But that is a bigger change, so we should avoid that right now. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-929) Download link in 0.7 quickstart broken
David Arthur created KAFKA-929: -- Summary: Download link in 0.7 quickstart broken Key: KAFKA-929 URL: https://issues.apache.org/jira/browse/KAFKA-929 Project: Kafka Issue Type: Bug Components: website Reporter: David Arthur http://kafka.apache.org/07/quickstart.html links to http://kafka.apache.org/07/downloads.html, instead of http://kafka.apache.org/downloads.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: outstanding kafka patches
Issue Subscription Filter: outstanding kafka patches (76 issues) The list of outstanding kafka patches Subscriber: kafka-mailing-list Key Summary KAFKA-928 new topics may not be processed after ZK session expiration in controller https://issues.apache.org/jira/browse/KAFKA-928 KAFKA-927 Integrate controlled shutdown into kafka shutdown hook https://issues.apache.org/jira/browse/KAFKA-927 KAFKA-925 Add optional partition key override in producer https://issues.apache.org/jira/browse/KAFKA-925 KAFKA-923 Improve controller failover latency https://issues.apache.org/jira/browse/KAFKA-923 KAFKA-922 System Test - set retry.backoff.ms=300 to testcase_0119 https://issues.apache.org/jira/browse/KAFKA-922 KAFKA-917 Expose zk.session.timeout.ms in console consumer https://issues.apache.org/jira/browse/KAFKA-917 KAFKA-915 System Test - Mirror Maker testcase_5001 failed https://issues.apache.org/jira/browse/KAFKA-915 KAFKA-911 Bug in controlled shutdown logic in controller leads to controller not sending out some state change request https://issues.apache.org/jira/browse/KAFKA-911 KAFKA-905 Logs can have same offsets causing recovery failure https://issues.apache.org/jira/browse/KAFKA-905 KAFKA-903 [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed https://issues.apache.org/jira/browse/KAFKA-903 KAFKA-898 Add a KafkaMetricsReporter that wraps Librato's reporter https://issues.apache.org/jira/browse/KAFKA-898 KAFKA-896 merge 0.8 (988d4d8e65a14390abd748318a64e281e4a37c19) to trunk https://issues.apache.org/jira/browse/KAFKA-896 KAFKA-885 sbt package builds two kafka jars https://issues.apache.org/jira/browse/KAFKA-885 KAFKA-881 Kafka broker not respecting log.roll.hours https://issues.apache.org/jira/browse/KAFKA-881 KAFKA-877 Still getting kafka.common.NotLeaderForPartitionException https://issues.apache.org/jira/browse/KAFKA-877 KAFKA-873 Consider replacing zkclient with curator (with zkclient-bridge) https://issues.apache.org/jira/browse/KAFKA-873 KAFKA-868 System Test - add test case for rolling controlled shutdown https://issues.apache.org/jira/browse/KAFKA-868 KAFKA-863 System Test - update 0.7 version of kafka-run-class.sh for Migration Tool test cases https://issues.apache.org/jira/browse/KAFKA-863 KAFKA-859 support basic auth protection of mx4j console https://issues.apache.org/jira/browse/KAFKA-859 KAFKA-855 Ant+Ivy build for Kafka https://issues.apache.org/jira/browse/KAFKA-855 KAFKA-854 Upgrade dependencies for 0.8 https://issues.apache.org/jira/browse/KAFKA-854 KAFKA-852 Remove clientId from OffsetFetchResponse and OffsetCommitResponse https://issues.apache.org/jira/browse/KAFKA-852 KAFKA-836 Update quickstart for Kafka 0.8 https://issues.apache.org/jira/browse/KAFKA-836 KAFKA-835 Update 0.8 configs on the website https://issues.apache.org/jira/browse/KAFKA-835 KAFKA-815 Improve SimpleConsumerShell to take in a max messages config option https://issues.apache.org/jira/browse/KAFKA-815 KAFKA-745 Remove getShutdownReceive() and other kafka specific code from the RequestChannel https://issues.apache.org/jira/browse/KAFKA-745 KAFKA-739 Handle null values in Message payload https://issues.apache.org/jira/browse/KAFKA-739 KAFKA-735 Add looping and JSON output for ConsumerOffsetChecker https://issues.apache.org/jira/browse/KAFKA-735 KAFKA-717 scala 2.10 build support https://issues.apache.org/jira/browse/KAFKA-717 KAFKA-705 Controlled shutdown doesn't seem to work on more than one broker in a cluster https://issues.apache.org/jira/browse/KAFKA-705 KAFKA-686 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper https://issues.apache.org/jira/browse/KAFKA-686 KAFKA-682 java.lang.OutOfMemoryError: Java heap space https://issues.apache.org/jira/browse/KAFKA-682 KAFKA-677 Retention process gives exception if an empty segment is chosen for collection https://issues.apache.org/jira/browse/KAFKA-677 KAFKA-674 Clean Shutdown Testing - Log segments checksums mismatch https://issues.apache.org/jira/browse/KAFKA-674 KAFKA-652 Create testcases for clean shut-down https://issues.apache.org/jira/browse/KAFKA-652 KAFKA-649 Cleanup log4j logging https://issues.apache.org/jira/browse/KAFKA-649 KAFKA-645 Create a shell script to run System Test with DEBUG details and "tee" console output to a file https://issues.apache.org/jira/browse/KAFKA-645 KAFKA-637 Separate l
[jira] [Updated] (KAFKA-928) new topics may not be processed after ZK session expiration in controller
[ https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-928: Attachment: kafka-928-v2.patch I think you are right, we don't need both anymore. See the updated patch. > new topics may not be processed after ZK session expiration in controller > - > > Key: KAFKA-928 > URL: https://issues.apache.org/jira/browse/KAFKA-928 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Neha Narkhede >Priority: Blocker > Attachments: kafka-928.patch, kafka-928-v2.patch > > > When controller loses its ZK session, it calls partitionStateMachine.shutdown > in SessionExpirationListener, which marks the partitionStateMachine as down. > However, when the controller regains its controllership, it doesn't mark > partitionStateMachine as up. In TopicChangeListener, we only process new > topics if the partitionStateMachine is marked up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-927: - Attachment: KAFKA-927-v3.patch 1.1 Done 1.2 Done 2. We would need to do some of these to ensure the new leader is updated and the log itself is going to be truncated either on startup or shutdown. Hence did not feel a strong reason to make this path more optimized. 3. As we spoke offline, there seems to be edge case where not updating ZK could lead to bad things happening. So updating ZK before leaderisr request. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-928) new topics may not be processed after ZK session expiration in controller
[ https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673533#comment-13673533 ] Jun Rao commented on KAFKA-928: --- Thanks for patch v2. +1. > new topics may not be processed after ZK session expiration in controller > - > > Key: KAFKA-928 > URL: https://issues.apache.org/jira/browse/KAFKA-928 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Neha Narkhede >Priority: Blocker > Attachments: kafka-928.patch, kafka-928-v2.patch > > > When controller loses its ZK session, it calls partitionStateMachine.shutdown > in SessionExpirationListener, which marks the partitionStateMachine as down. > However, when the controller regains its controllership, it doesn't mark > partitionStateMachine as up. In TopicChangeListener, we only process new > topics if the partitionStateMachine is marked up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-928) new topics may not be processed after ZK session expiration in controller
[ https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-928: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review, committed patch to 08 > new topics may not be processed after ZK session expiration in controller > - > > Key: KAFKA-928 > URL: https://issues.apache.org/jira/browse/KAFKA-928 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Neha Narkhede >Priority: Blocker > Attachments: kafka-928.patch, kafka-928-v2.patch > > > When controller loses its ZK session, it calls partitionStateMachine.shutdown > in SessionExpirationListener, which marks the partitionStateMachine as down. > However, when the controller regains its controllership, it doesn't mark > partitionStateMachine as up. In TopicChangeListener, we only process new > topics if the partitionStateMachine is marked up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (KAFKA-928) new topics may not be processed after ZK session expiration in controller
[ https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-928. --- > new topics may not be processed after ZK session expiration in controller > - > > Key: KAFKA-928 > URL: https://issues.apache.org/jira/browse/KAFKA-928 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Neha Narkhede >Priority: Blocker > Attachments: kafka-928.patch, kafka-928-v2.patch > > > When controller loses its ZK session, it calls partitionStateMachine.shutdown > in SessionExpirationListener, which marks the partitionStateMachine as down. > However, when the controller regains its controllership, it doesn't mark > partitionStateMachine as up. In TopicChangeListener, we only process new > topics if the partitionStateMachine is marked up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-928) new topics may not be processed after ZK session expiration in controller
[ https://issues.apache.org/jira/browse/KAFKA-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673571#comment-13673571 ] Swapnil Ghike commented on KAFKA-928: - Was just about to comment, perhaps it would be good to rename hasStarted to isRunning like in KafkaController. +1 otherwise. > new topics may not be processed after ZK session expiration in controller > - > > Key: KAFKA-928 > URL: https://issues.apache.org/jira/browse/KAFKA-928 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Neha Narkhede >Priority: Blocker > Attachments: kafka-928.patch, kafka-928-v2.patch > > > When controller loses its ZK session, it calls partitionStateMachine.shutdown > in SessionExpirationListener, which marks the partitionStateMachine as down. > However, when the controller regains its controllership, it doesn't mark > partitionStateMachine as up. In TopicChangeListener, we only process new > topics if the partitionStateMachine is marked up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673651#comment-13673651 ] Jun Rao commented on KAFKA-927: --- Thanks for patch v3. A few more comments: 30. KafkaServer: 30.1 Could you combine isShuttingDown and startupComplete? 30.2 In controlledShutdown(), it's not clear if it's worth caching the socket channel. Technically, it's possible for a controller to come back on the broker with the same id, but with a different broker host/port. It's simpler to just always close the socket channel on each ControlledShutdownRequest and create a new channel on retry. 31. KafkaController: 31.1 remove unused import java.util.concurrent.{Semaphore 31.2 I think we still need to set shuttingDownBrokerIds to empty in onControllerFailover(). A controller may failover during a controlled shutdown and later regain the controllership. OnBrokerFailure() is only called if the controller is active. So shuttingDownBrokerIds may not be empty when the controllership switches back. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673659#comment-13673659 ] Neha Narkhede commented on KAFKA-927: - +1 on v3 other than Jun's comments. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673666#comment-13673666 ] Sriram Subramanian commented on KAFKA-927: -- 30.1 Don't feel strong about this. I think it makes things less readable with not much savings 30.2 The new broker includes the host and port and hence it works. 31.1 Done 31.2 This is already there in the previous patch. It is in InitializeControllerContext > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, > KAFKA-927-v3-removeimports.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-927: - Attachment: KAFKA-927-v3-removeimports.patch > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, > KAFKA-927-v3-removeimports.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-927: - Attachment: KAFKA-927-v4.patch >From offline feedback 1. reset startupcomplete flag on shutdown for unit test 2. cleaned channel before shutting down > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, > KAFKA-927-v3-removeimports.patch, KAFKA-927-v4.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-930) Integrate preferred replica election logic into kafka
Sriram Subramanian created KAFKA-930: Summary: Integrate preferred replica election logic into kafka Key: KAFKA-930 URL: https://issues.apache.org/jira/browse/KAFKA-930 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Sriram Subramanian Fix For: 0.9 It seems useful to integrate the preferred replica election logic into kafka controller. A simple way to implement this would be to have a background thread that periodically finds the topic partitions that are not assigned to the preferred broker and initiate the move. We could come up with some heuristics to initiate the move only if the imbalance over a specific threshold in order to avoid rebalancing too aggressively. Making the software do this reduces operational cost. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook
[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-927. --- Resolution: Fixed Fix Version/s: 0.8 Thanks for patch v4. +1 and committed to 0.8. > Integrate controlled shutdown into kafka shutdown hook > -- > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Fix For: 0.8 > > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, > KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch, > KAFKA-927-v3-removeimports.patch, KAFKA-927-v4.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673781#comment-13673781 ] Jay Kreps commented on KAFKA-903: - +1 > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Priority: Blocker > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffe txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,213} DEBUG [Thread-1-SendThread(localhost:2181)] > (ClientCnxn.java:838) - Reading reply sessionid:0x1
[jira] [Updated] (KAFKA-905) Logs can have same offsets causing recovery failure
[ https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-905: - Attachment: KAFKA-905-v2.patch - made the logging changes - added the missing file > Logs can have same offsets causing recovery failure > --- > > Key: KAFKA-905 > URL: https://issues.apache.org/jira/browse/KAFKA-905 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Fix For: 0.8 > > Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-v2.patch > > > Consider the following scenario - > L F > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m65 m5,m6 > HW = 6 HW = 4 > Follower goes down and comes back up. Truncates its log to HW > L F > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 > HW = 6HW = 4 > Before follower catches up with the leader, leader goes down and follower > becomes the leader. It then gets new messages > F L > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m6 10 m5-m10 > HW=6 HW=4 > follower fetches from offset 7. Since offset 7 is within the compressed > message 10 in the leader, the whole message chunk is sent to the follower > FL > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 10 m5-m10 > 10 m5-m10 > HW=4 HW=10 > The follower logs now contain the same offsets. On recovery, re-indexing will > fail due to repeated offsets. > Possible ways to fix this - > 1. The fetcher thread can do deep iteration instead of shallow iteration and > drop the offsets that are less than the log end offset. This would however > incur performance hit. > 2. To optimize step 1, we could do the deep iteration till the logical offset > of the fetched message set is greater than the log end offset of the follower > log and then switch to shallow iteration. > 3. On recovery we just truncate the active segment and refetch the data. > All the above 3 steps are hacky. The right fix is to ensure we never corrupt > the logs. We can incur data loss but should not compromise consistency. For > 0.8, the easiest and simplest fix would be 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[ https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-903: -- Resolution: Fixed Fix Version/s: 0.8 Assignee: Jun Rao Status: Resolved (was: Patch Available) Thanks for the review. Committed v3 to 0.8. > [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > --- > > Key: KAFKA-903 > URL: https://issues.apache.org/jira/browse/KAFKA-903 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8 > Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, > probably copied on 4/30. kafka-0.8, built current on 4/30. > -rwx--+ 1 reefedjib None 41123 Mar 19 2009 commons-cli-1.2.jar > -rwx--+ 1 reefedjib None 58160 Jan 11 13:45 commons-codec-1.4.jar > -rwx--+ 1 reefedjib None 575389 Apr 18 13:41 > commons-collections-3.2.1.jar > -rwx--+ 1 reefedjib None 143847 May 21 2009 commons-compress-1.0.jar > -rwx--+ 1 reefedjib None 52543 Jan 11 13:45 commons-exec-1.1.jar > -rwx--+ 1 reefedjib None 57779 Jan 11 13:45 commons-fileupload-1.2.1.jar > -rwx--+ 1 reefedjib None 109043 Jan 20 2008 commons-io-1.4.jar > -rwx--+ 1 reefedjib None 279193 Jan 11 13:45 commons-lang-2.5.jar > -rwx--+ 1 reefedjib None 60686 Jan 11 13:45 commons-logging-1.1.1.jar > -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar > -rwx--+ 1 reefedjib None 206866 Apr 7 21:24 jackson-core-2.1.4.jar > -rwx--+ 1 reefedjib None 232245 Apr 7 21:24 jackson-core-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 69314 Apr 7 21:24 > jackson-dataformat-smile-2.1.4.jar > -rwx--+ 1 reefedjib None 780385 Apr 7 21:24 > jackson-mapper-asl-1.9.12.jar > -rwx--+ 1 reefedjib None 47913 May 9 23:39 jopt-simple-3.0-rc2.jar > -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 > kafka_2.8.0-0.8.0-SNAPSHOT.jar > -rwx--+ 1 reefedjib None 481535 Jan 11 13:46 log4j-1.2.16.jar > -rwx--+ 1 reefedjib None 20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar > -rwx--+ 1 reefedjib None 251784 Apr 18 13:41 logback-classic-1.0.6.jar > -rwx--+ 1 reefedjib None 349706 Apr 18 13:41 logback-core-1.0.6.jar > -rwx--+ 1 reefedjib None 82123 Nov 26 13:11 metrics-core-2.2.0.jar > -rwx--+ 1 reefedjib None 1540457 Jul 12 2012 ojdbc14.jar > -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar > -rwx--+ 1 reefedjib None 3114958 Apr 2 07:47 scalatest_2.10-1.9.1.jar > -rwx--+ 1 reefedjib None 25962 Apr 18 13:41 slf4j-api-1.6.5.jar > -rwx--+ 1 reefedjib None 62269 Nov 29 03:26 zkclient-0.2.jar > -rwx--+ 1 reefedjib None 601677 Apr 18 13:41 zookeeper-3.3.3.jar >Reporter: Rob Withers >Assignee: Jun Rao >Priority: Blocker > Fix For: 0.8 > > Attachments: kafka_2.8.0-0.8.0-SNAPSHOT.jar, kafka-903.patch, > kafka-903_v2.patch, kafka-903_v3.patch > > > This FATAL shuts down both brokers on windows, > {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 > messages with no compression to [robert_v_2x0,0] > {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer > sending messages with correlation id 178 for topics [robert_v_2x0,0] to > broker 1 on 192.168.1.100:9093 > {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] > (Logging.scala:109) - Attempt to swap the new high watermark file with the > old one failed > {2013-05-10 18:23:57,739} INFO [Thread-4] (Logging.scala:67) - [Kafka > Server 0], shutting down > Furthermore, attempts to restart them fail, with the following log: > {2013-05-10 19:14:52,156} INFO [Thread-1] (Logging.scala:67) - [Kafka Server > 0], started > {2013-05-10 19:14:52,157} INFO [ZkClient-EventThread-32-localhost:2181] > (Logging.scala:67) - New leader is 0 > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:79) - Delivering event #1 done > {2013-05-10 19:14:52,193} DEBUG [ZkClient-EventThread-32-localhost:2181] > (ZkEventThread.java:69) - Delivering event #4 ZkEvent[Data of > /controller_epoch changed sent to > kafka.controller.ControllerEpochListener@5cb88f42] > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:78) - Processing request:: > sessionid:0x13e9127882e0001 type:exists cxid:0x1d zxid:0xfffe > txntype:unknown reqpath:/controller_epoch > {2013-05-10 19:14:52,210} DEBUG [SyncThread:0] > (FinalRequestProcessor.java:160) - sessionid:0x13e9127882e0001 type:exists > cxid:0x1d zxid:0xfffe
[jira] [Resolved] (KAFKA-905) Logs can have same offsets causing recovery failure
[ https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-905. - Resolution: Fixed Thanks for v2, committed it to 0.8 > Logs can have same offsets causing recovery failure > --- > > Key: KAFKA-905 > URL: https://issues.apache.org/jira/browse/KAFKA-905 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Fix For: 0.8 > > Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-v2.patch > > > Consider the following scenario - > L F > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m65 m5,m6 > HW = 6 HW = 4 > Follower goes down and comes back up. Truncates its log to HW > L F > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 > HW = 6HW = 4 > Before follower catches up with the leader, leader goes down and follower > becomes the leader. It then gets new messages > F L > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m6 10 m5-m10 > HW=6 HW=4 > follower fetches from offset 7. Since offset 7 is within the compressed > message 10 in the leader, the whole message chunk is sent to the follower > FL > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 10 m5-m10 > 10 m5-m10 > HW=4 HW=10 > The follower logs now contain the same offsets. On recovery, re-indexing will > fail due to repeated offsets. > Possible ways to fix this - > 1. The fetcher thread can do deep iteration instead of shallow iteration and > drop the offsets that are less than the log end offset. This would however > incur performance hit. > 2. To optimize step 1, we could do the deep iteration till the logical offset > of the fetched message set is greater than the log end offset of the follower > log and then switch to shallow iteration. > 3. On recovery we just truncate the active segment and refetch the data. > All the above 3 steps are hacky. The right fix is to ensure we never corrupt > the logs. We can incur data loss but should not compromise consistency. For > 0.8, the easiest and simplest fix would be 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-931) make zookeeper.connect a required property
Jun Rao created KAFKA-931: - Summary: make zookeeper.connect a required property Key: KAFKA-931 URL: https://issues.apache.org/jira/browse/KAFKA-931 Project: Kafka Issue Type: Improvement Affects Versions: 0.8 Reporter: Jun Rao Assignee: Jun Rao Currently, zookeeper.connect defaults to a null string. If this property is not overwritten, we will see weird NullPointerException in ZK. It will be better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-905) Logs can have same offsets causing recovery failure
[ https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Subramanian updated KAFKA-905: - Attachment: KAFKA-905-trunk.patch changes for trunk > Logs can have same offsets causing recovery failure > --- > > Key: KAFKA-905 > URL: https://issues.apache.org/jira/browse/KAFKA-905 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Sriram Subramanian >Assignee: Sriram Subramanian > Fix For: 0.8 > > Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-trunk.patch, > KAFKA-905-v2.patch > > > Consider the following scenario - > L F > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m65 m5,m6 > HW = 6 HW = 4 > Follower goes down and comes back up. Truncates its log to HW > L F > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 > HW = 6HW = 4 > Before follower catches up with the leader, leader goes down and follower > becomes the leader. It then gets new messages > F L > 1 m1,m21 m1,m2 > 3 m3,m43 m3,m4 > 5 m5,m6 10 m5-m10 > HW=6 HW=4 > follower fetches from offset 7. Since offset 7 is within the compressed > message 10 in the leader, the whole message chunk is sent to the follower > FL > 1 m1,m2 1 m1,m2 > 3 m3,m4 3 m3,m4 > 5 m5,m6 10 m5-m10 > 10 m5-m10 > HW=4 HW=10 > The follower logs now contain the same offsets. On recovery, re-indexing will > fail due to repeated offsets. > Possible ways to fix this - > 1. The fetcher thread can do deep iteration instead of shallow iteration and > drop the offsets that are less than the log end offset. This would however > incur performance hit. > 2. To optimize step 1, we could do the deep iteration till the logical offset > of the fetched message set is greater than the log end offset of the follower > log and then switch to shallow iteration. > 3. On recovery we just truncate the active segment and refetch the data. > All the above 3 steps are hacky. The right fix is to ensure we never corrupt > the logs. We can incur data loss but should not compromise consistency. For > 0.8, the easiest and simplest fix would be 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property
[ https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-931: -- Attachment: kafka-931.patch Attach a patch that does the following: 1. make zookeeper.connect a required property. 2. change the default value of auto.offset.reset to largest (to be consistent with ConsoleConsumer). 3. change the property name queued.max.messages to make it more intuitive. > make zookeeper.connect a required property > -- > > Key: KAFKA-931 > URL: https://issues.apache.org/jira/browse/KAFKA-931 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Jun Rao > Attachments: kafka-931.patch > > > Currently, zookeeper.connect defaults to a null string. If this property is > not overwritten, we will see weird NullPointerException in ZK. It will be > better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-931) make zookeeper.connect a required property
[ https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673858#comment-13673858 ] Neha Narkhede commented on KAFKA-931: - +1. Minor suggestion - Can we change the default of queued.max.message.chunks to 2. Because, as long as there is one more message chunk ready to be processed, it is good enough. Having more chunks in the queue is somewhat pointless. > make zookeeper.connect a required property > -- > > Key: KAFKA-931 > URL: https://issues.apache.org/jira/browse/KAFKA-931 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Jun Rao > Attachments: kafka-931.patch > > > Currently, zookeeper.connect defaults to a null string. If this property is > not overwritten, we will see weird NullPointerException in ZK. It will be > better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property
[ https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-931: -- Resolution: Fixed Fix Version/s: 0.8 Status: Resolved (was: Patch Available) Thanks for the review. Committed to 0.8 with the suggested change. > make zookeeper.connect a required property > -- > > Key: KAFKA-931 > URL: https://issues.apache.org/jira/browse/KAFKA-931 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Jun Rao > Fix For: 0.8 > > Attachments: kafka-931.patch > > > Currently, zookeeper.connect defaults to a null string. If this property is > not overwritten, we will see weird NullPointerException in ZK. It will be > better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (KAFKA-931) make zookeeper.connect a required property
[ https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-931 started by Jun Rao. > make zookeeper.connect a required property > -- > > Key: KAFKA-931 > URL: https://issues.apache.org/jira/browse/KAFKA-931 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Jun Rao > Attachments: kafka-931.patch > > > Currently, zookeeper.connect defaults to a null string. If this property is > not overwritten, we will see weird NullPointerException in ZK. It will be > better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-931) make zookeeper.connect a required property
[ https://issues.apache.org/jira/browse/KAFKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-931: -- Status: Patch Available (was: In Progress) > make zookeeper.connect a required property > -- > > Key: KAFKA-931 > URL: https://issues.apache.org/jira/browse/KAFKA-931 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8 >Reporter: Jun Rao >Assignee: Jun Rao > Attachments: kafka-931.patch > > > Currently, zookeeper.connect defaults to a null string. If this property is > not overwritten, we will see weird NullPointerException in ZK. It will be > better to make zookeeper.connect required property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673999#comment-13673999 ] Jun Rao commented on KAFKA-917: --- The patch no longer applies to 0.8. Could you rebase? > Expose zk.session.timeout.ms in console consumer > > > Key: KAFKA-917 > URL: https://issues.apache.org/jira/browse/KAFKA-917 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7, 0.8 >Reporter: Swapnil Ghike >Assignee: Swapnil Ghike >Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: kafka-917.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (KAFKA-929) Download link in 0.7 quickstart broken
[ https://issues.apache.org/jira/browse/KAFKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-929. --- Resolution: Fixed Thanks for identifying this. Fixed the website. > Download link in 0.7 quickstart broken > -- > > Key: KAFKA-929 > URL: https://issues.apache.org/jira/browse/KAFKA-929 > Project: Kafka > Issue Type: Bug > Components: website >Reporter: David Arthur > > http://kafka.apache.org/07/quickstart.html > links to http://kafka.apache.org/07/downloads.html, instead of > http://kafka.apache.org/downloads.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira