Review Request 37480: Patch for KAFKA-2434
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37480/ --- Review request for kafka. Bugs: KAFKA-2434 https://issues.apache.org/jira/browse/KAFKA-2434 Repository: kafka Description --- Remove identical topic subscription constraint for roundrobin strategy in old consumer API Diffs - core/src/main/scala/kafka/consumer/PartitionAssignor.scala 849284ad2cfa00ed36a36548556813df639ceae9 core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala e42d10488f8dfded0f34ba5da2da74a3cb2a5947 core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala adf08010597b7c6ed72eddf93962497c3209e10f Diff: https://reviews.apache.org/r/37480/diff/ Testing --- Thanks, Andrew Olson
Review Request 37481: Patch for KAFKA-2435
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37481/ --- Review request for kafka. Bugs: KAFKA-2435 https://issues.apache.org/jira/browse/KAFKA-2435 Repository: kafka Description --- Fair assignment strategy Diffs - clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java d35b421a515074d964c7fccb73d260b847ea5f00 core/src/main/scala/kafka/consumer/ConsumerConfig.scala 97a56ce7f2acbf9b5c9b8360268b72722791a96a core/src/main/scala/kafka/consumer/PartitionAssignor.scala 849284ad2cfa00ed36a36548556813df639ceae9 core/src/main/scala/kafka/coordinator/PartitionAssignor.scala 8499bf86f1de21091633d047e95b260003132ac8 core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala adf08010597b7c6ed72eddf93962497c3209e10f core/src/test/scala/unit/kafka/coordinator/PartitionAssignorTest.scala 887cee5a582b5737ba838920399bb9b24bf22382 Diff: https://reviews.apache.org/r/37481/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/admin/AdminUtils.scala, line 356 > > <https://reviews.apache.org/r/17537/diff/3/?file=456702#file456702line356> > > > > "unclean leader" is a bit confusing, how about > > uncleanLeaderElectionEnabled? Sounds good to me. Just to confirm, you are suggesting that it is preferable to name this simply "uncleanLeaderElectionEnabled" instead of something more verbose like "isUncleanLeaderElectionEnabledForTopic" or the current name? - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review35999 ------- On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated Jan. 30, 2014, 7:45 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > a0267ae2670e8d5f365e49ec0fa5db1f62b815bf > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fd9200f3bf941aab54df798bb5899eeb552ea3a3 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 426b1a7bea1d83a64081f2c6b672c88c928713b7 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64 > > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64> > > > > Not sure throwing an exception is the right behavior here? If unclean > > election is disabled, which means we choose consistency over availability, > > when ISR is empty we should keep the partition as offline from the clients > > until it is not empty. Throwing an exception will just halt the whole > > partition state change process, but may not necessarily resume when the ISR > > has changed. Could you confirm in your integration tests if this is the > > case? The exception-handling and partition state changes are verified in the integration test, see lines 226-249 of UncleanLeaderElectionTest. - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review35999 ------- On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated Jan. 30, 2014, 7:45 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > a0267ae2670e8d5f365e49ec0fa5db1f62b815bf > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fd9200f3bf941aab54df798bb5899eeb552ea3a3 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 426b1a7bea1d83a64081f2c6b672c88c928713b7 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala, line 86 > > <https://reviews.apache.org/r/17537/diff/3/?file=456708#file456708line86> > > > > I am not sure we need to check the unclean election flag here. If we > > can keep the partition as offline for clients in the controller phase, will > > that be sufficient since no data will be appended/consumed from these > > replica logs until that is unblocked? > > > > As for reading the configs, I agree that if we consider per-topic > > configs then we probably need to read it from ZK. Yes, under normal circumstances the partition would be kept offline until the previous leader is available, which would be sufficient to prevent log truncation from ever being considered. This code was added as an extra safeguard to handle the remote possibility of the config being changed while a replica is offline, which I believe is the only way this code path could be reached. I tried to describe this in the comment below: "This situation could only happen if the unclean election configuration for a topic changes while a replica is down. Otherwise, we should never encounter this situation since a non-ISR leader cannot be elected if disallowed by the broker configuration." In the follow-up patch I'll move this comment above the AdminUtils check so that it's more visible. - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review35999 ------- On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated Jan. 30, 2014, 7:45 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > a0267ae2670e8d5f365e49ec0fa5db1f62b815bf > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fd9200f3bf941aab54df798bb5899eeb552ea3a3 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 426b1a7bea1d83a64081f2c6b672c88c928713b7 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/ --- (Updated March 4, 2014, 12:48 a.m.) Review request for kafka. Bugs: KAFKA-1028 https://issues.apache.org/jira/browse/KAFKA-1028 Repository: kafka Description --- KAFKA-1028: per topic configuration of preference for consistency over availability Diffs (updated) - core/src/main/scala/kafka/admin/AdminUtils.scala 36ddeb44490e8343a4e8056c45726ac660e4b2f9 core/src/main/scala/kafka/common/NoReplicaOnlineException.scala a1e12794978adf79020936c71259bbdabca8ee68 core/src/main/scala/kafka/controller/KafkaController.scala b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fa29bbe82db35551f43ac487912fba7bae1b2599 core/src/main/scala/kafka/log/LogConfig.scala 0b32aeeffcd9d4755ac90573448d197d3f729749 core/src/main/scala/kafka/server/KafkaConfig.scala 04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 73e605eb31bc71642d48b0bb8bd1632fd70b9dca core/src/test/scala/unit/kafka/admin/AdminTest.scala d5644ea40ec7678b975c4775546b79fcfa9f64b7 core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala b585f0ec0b1c402d95a3b34934dab7545dcfcb1f core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala PRE-CREATION core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 89c207a3f56c7a7711f8cee6fb277626329882a6 core/src/test/scala/unit/kafka/utils/TestUtils.scala 772d2140ed926a2f9f0c99aea60daf1a9b987073 Diff: https://reviews.apache.org/r/17537/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote: > > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala, line 86 > > <https://reviews.apache.org/r/17537/diff/3/?file=456708#file456708line86> > > > > Reading from zookeeper is unnecessary here, since the broker has a > > mechanism to load per topic configs on the fly. Just accessing it through > > the config object should suffice > > Neha Narkhede wrote: > Here you could just do > > if(brokerConfig.uncleanLeaderElectionEnable) instead of if > (!AdminUtils.canElectUncleanLeaderForTopic(replicaMgr.zkClient, > topicAndPartition.topic, brokerConfig.uncleanLeaderElectionEnable)) > > To understand how the broker dynamically loads per topic configs, you can > look at TopicConfigManager. It registers a zookeeper listener on the topic > config change path and atomically switches the log config object to reflect > the new per topic config overrides. See justification in PartitionLeaderSelector comment thread above. - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review33658 ------- On March 4, 2014, 12:48 a.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated March 4, 2014, 12:48 a.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > 36ddeb44490e8343a4e8056c45726ac660e4b2f9 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fa29bbe82db35551f43ac487912fba7bae1b2599 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/admin/AdminTest.scala > d5644ea40ec7678b975c4775546b79fcfa9f64b7 > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 772d2140ed926a2f9f0c99aea60daf1a9b987073 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
> On March 14, 2014, 8:13 p.m., Jay Kreps wrote: > > This is great! > > > > I don't think we should add a special purpose method that hard codes one > > property in AdminUtils, I think the helper code is actually there in > > LogConfig. > > > > I see Neha's point about adding a higher-level config to log config but I > > think it is worth it and not so bad as it is just a config--there is no > > major code coupling problem. > > > > The only real question I have which is really for Neha is whether adding > > these new ZK queries is a performance concern for leadership election and > > transfer? Should instead the controller or whoever just watch and > > constantly replicate these log configs all the time so that when election > > time comes no new fetches are required. This could perhaps be done as a > > follow-up improvement... > > > > Thanks for the feedback. The LogConfig.fromProps(...) approach works, but it looks like I'd need to update the implementation to provide default values for each of the props.getProperty(...) calls (it currently assumes that all supported properties are included in the provided Properties). Should I go ahead and make that change? - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review37265 --- On March 4, 2014, 12:48 a.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated March 4, 2014, 12:48 a.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > 36ddeb44490e8343a4e8056c45726ac660e4b2f9 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fa29bbe82db35551f43ac487912fba7bae1b2599 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/admin/AdminTest.scala > d5644ea40ec7678b975c4775546b79fcfa9f64b7 > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 772d2140ed926a2f9f0c99aea60daf1a9b987073 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/ --- (Updated March 17, 2014, 2:39 p.m.) Review request for kafka. Bugs: KAFKA-1028 https://issues.apache.org/jira/browse/KAFKA-1028 Repository: kafka Description --- KAFKA-1028: per topic configuration of preference for consistency over availability Diffs (updated) - core/src/main/scala/kafka/common/NoReplicaOnlineException.scala a1e12794978adf79020936c71259bbdabca8ee68 core/src/main/scala/kafka/controller/KafkaController.scala 5db24a73a62c725a76c79936abb15b1fda3b770b core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fa29bbe82db35551f43ac487912fba7bae1b2599 core/src/main/scala/kafka/log/LogConfig.scala 18c86fed5efc765f9b059775988cc83ef0ef3c3b core/src/main/scala/kafka/server/KafkaConfig.scala d07796ec87fa9bf0d13e1690240b85979aba3224 core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 73e605eb31bc71642d48b0bb8bd1632fd70b9dca core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala b585f0ec0b1c402d95a3b34934dab7545dcfcb1f core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala PRE-CREATION core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 89c207a3f56c7a7711f8cee6fb277626329882a6 core/src/test/scala/unit/kafka/utils/TestUtils.scala 772d2140ed926a2f9f0c99aea60daf1a9b987073 Diff: https://reviews.apache.org/r/17537/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
> On March 14, 2014, 8:13 p.m., Jay Kreps wrote: > > This is great! > > > > I don't think we should add a special purpose method that hard codes one > > property in AdminUtils, I think the helper code is actually there in > > LogConfig. > > > > I see Neha's point about adding a higher-level config to log config but I > > think it is worth it and not so bad as it is just a config--there is no > > major code coupling problem. > > > > The only real question I have which is really for Neha is whether adding > > these new ZK queries is a performance concern for leadership election and > > transfer? Should instead the controller or whoever just watch and > > constantly replicate these log configs all the time so that when election > > time comes no new fetches are required. This could perhaps be done as a > > follow-up improvement... > > > > > > Andrew Olson wrote: > Thanks for the feedback. The LogConfig.fromProps(...) approach works, but > it looks like I'd need to update the implementation to provide default values > for each of the props.getProperty(...) calls (it currently assumes that all > supported properties are included in the provided Properties). Should I go > ahead and make that change? > > Neha Narkhede wrote: > I do think that the zookeeper read is a performance concern during > leadership changes, since the patch currently does one read per partition. I > think it is not so much a problem with the patch but more with us improving > our handling of per topic non-log configs. As you suggested, we should add > the ability to refresh per topic configs on the controller and possibly the > ReplicaManager as well. But I think we can do that as a follow up improvement > and accept this patch. > > Neha Narkhede wrote: > >> Thanks for the feedback. The LogConfig.fromProps(...) approach works, > but it looks like I'd need to update the implementation to provide default > values for each of the props.getProperty(...) calls (it currently assumes > that all supported properties are included in the provided Properties). > Should I go ahead and make that change? > > Yes please. Would you mind updating your patch with that change and also > rebasing it? I think we can check it in after that. Patch has been rebased and updated. - Andrew ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review37265 --- On March 17, 2014, 2:39 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated March 17, 2014, 2:39 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > 5db24a73a62c725a76c79936abb15b1fda3b770b > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fa29bbe82db35551f43ac487912fba7bae1b2599 > core/src/main/scala/kafka/log/LogConfig.scala > 18c86fed5efc765f9b059775988cc83ef0ef3c3b > core/src/main/scala/kafka/server/KafkaConfig.scala > d07796ec87fa9bf0d13e1690240b85979aba3224 > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 772d2140ed926a2f9f0c99aea60daf1a9b987073 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
> On March 17, 2014, 4:30 p.m., Jay Kreps wrote: > > core/src/main/scala/kafka/server/KafkaConfig.scala, line 256 > > <https://reviews.apache.org/r/17537/diff/4-5/?file=509273#file509273line256> > > > > Are these config changes due to needing a rebase or something? They > > don't seem related to the patch. Yes, the rebase this morning picked up KAFKA-1012 [1]. Should a rebase and patch code update be uploaded to the review board separately? [1] https://github.com/apache/kafka/commit/a670537aa33732b15b56644d8ccc1681e16395f5 - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review37386 --- On March 17, 2014, 2:39 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated March 17, 2014, 2:39 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > 5db24a73a62c725a76c79936abb15b1fda3b770b > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fa29bbe82db35551f43ac487912fba7bae1b2599 > core/src/main/scala/kafka/log/LogConfig.scala > 18c86fed5efc765f9b059775988cc83ef0ef3c3b > core/src/main/scala/kafka/server/KafkaConfig.scala > d07796ec87fa9bf0d13e1690240b85979aba3224 > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 772d2140ed926a2f9f0c99aea60daf1a9b987073 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Review Request 17537: Patch for KAFKA-1028
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/ --- Review request for kafka. Bugs: KAFKA-1028 https://issues.apache.org/jira/browse/KAFKA-1028 Repository: kafka Description --- KAFKA-1028: per topic configuration of preference for consistency over availability Diffs - core/src/main/scala/kafka/common/NoReplicaOnlineException.scala a1e12794978adf79020936c71259bbdabca8ee68 core/src/main/scala/kafka/controller/KafkaController.scala a0267ae2670e8d5f365e49ec0fa5db1f62b815bf core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fd9200f3bf941aab54df798bb5899eeb552ea3a3 core/src/main/scala/kafka/server/KafkaConfig.scala 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 73e605eb31bc71642d48b0bb8bd1632fd70b9dca core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala b585f0ec0b1c402d95a3b34934dab7545dcfcb1f core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala PRE-CREATION core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 89c207a3f56c7a7711f8cee6fb277626329882a6 core/src/test/scala/unit/kafka/utils/TestUtils.scala d88b6c3e8fd8098d540998b6a82a65cec8a1dcb0 Diff: https://reviews.apache.org/r/17537/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/ --- (Updated Jan. 30, 2014, 7:42 p.m.) Review request for kafka. Bugs: KAFKA-1028 https://issues.apache.org/jira/browse/KAFKA-1028 Repository: kafka Description --- KAFKA-1028: per topic configuration of preference for consistency over availability Diffs (updated) - core/src/main/scala/kafka/admin/AdminUtils.scala a167756f0fd358574c8ccb42c5c96aaf13def4f5 core/src/main/scala/kafka/common/NoReplicaOnlineException.scala a1e12794978adf79020936c71259bbdabca8ee68 core/src/main/scala/kafka/controller/KafkaController.scala a0267ae2670e8d5f365e49ec0fa5db1f62b815bf core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fd9200f3bf941aab54df798bb5899eeb552ea3a3 core/src/main/scala/kafka/log/LogConfig.scala 0b32aeeffcd9d4755ac90573448d197d3f729749 core/src/main/scala/kafka/server/KafkaConfig.scala 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 73e605eb31bc71642d48b0bb8bd1632fd70b9dca core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala b585f0ec0b1c402d95a3b34934dab7545dcfcb1f core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala PRE-CREATION core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 89c207a3f56c7a7711f8cee6fb277626329882a6 core/src/test/scala/unit/kafka/utils/TestUtils.scala d88b6c3e8fd8098d540998b6a82a65cec8a1dcb0 Diff: https://reviews.apache.org/r/17537/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/ --- (Updated Jan. 30, 2014, 7:45 p.m.) Review request for kafka. Bugs: KAFKA-1028 https://issues.apache.org/jira/browse/KAFKA-1028 Repository: kafka Description --- KAFKA-1028: per topic configuration of preference for consistency over availability Diffs (updated) - core/src/main/scala/kafka/admin/AdminUtils.scala a167756f0fd358574c8ccb42c5c96aaf13def4f5 core/src/main/scala/kafka/common/NoReplicaOnlineException.scala a1e12794978adf79020936c71259bbdabca8ee68 core/src/main/scala/kafka/controller/KafkaController.scala a0267ae2670e8d5f365e49ec0fa5db1f62b815bf core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fd9200f3bf941aab54df798bb5899eeb552ea3a3 core/src/main/scala/kafka/log/LogConfig.scala 0b32aeeffcd9d4755ac90573448d197d3f729749 core/src/main/scala/kafka/server/KafkaConfig.scala 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 73e605eb31bc71642d48b0bb8bd1632fd70b9dca core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala b585f0ec0b1c402d95a3b34934dab7545dcfcb1f core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala PRE-CREATION core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 89c207a3f56c7a7711f8cee6fb277626329882a6 core/src/test/scala/unit/kafka/utils/TestUtils.scala 426b1a7bea1d83a64081f2c6b672c88c928713b7 Diff: https://reviews.apache.org/r/17537/diff/ Testing --- Thanks, Andrew Olson
Re: Review Request 17537: Patch for KAFKA-1028
> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote: > > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64 > > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64> > > > > the config object should already have the per topic preference for > > unclean leader election. So we don't have to read from zookeeper again. It doesn't look like this is actually the case. The KafkaConfig is passed from the KafkaServer to the KafkaController with no topic context, and the controller does not appear to be integrated with the topic log configuration logic in the TopicConfigManager/LogManager. Just to confirm my understanding of the code, I removed this Zookeeper read and doing so caused the two TopicOverride integration tests that I added to fail. Is there is a simpler or less awkward way to implement this as per topic configuration? Reading the config on demand from ZK seems like the simplest and least invasive option since this should not be a frequently executed code path, but I could be missing something obvious. - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review33658 ------- On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated Jan. 30, 2014, 7:45 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > a0267ae2670e8d5f365e49ec0fa5db1f62b815bf > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fd9200f3bf941aab54df798bb5899eeb552ea3a3 > core/src/main/scala/kafka/log/LogConfig.scala > 0b32aeeffcd9d4755ac90573448d197d3f729749 > core/src/main/scala/kafka/server/KafkaConfig.scala > 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala > 73e605eb31bc71642d48b0bb8bd1632fd70b9dca > core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala > b585f0ec0b1c402d95a3b34934dab7545dcfcb1f > core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala > PRE-CREATION > core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala > 89c207a3f56c7a7711f8cee6fb277626329882a6 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > 426b1a7bea1d83a64081f2c6b672c88c928713b7 > > Diff: https://reviews.apache.org/r/17537/diff/ > > > Testing > --- > > > Thanks, > > Andrew Olson > >
Re: Review Request 17537: Patch for KAFKA-1028
> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote: > > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64 > > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64> > > > > the config object should already have the per topic preference for > > unclean leader election. So we don't have to read from zookeeper again. > > Andrew Olson wrote: > It doesn't look like this is actually the case. The KafkaConfig is passed > from the KafkaServer to the KafkaController with no topic context, and the > controller does not appear to be integrated with the topic log configuration > logic in the TopicConfigManager/LogManager. > > Just to confirm my understanding of the code, I removed this Zookeeper > read and doing so caused the two TopicOverride integration tests that I added > to fail. Is there is a simpler or less awkward way to implement this as per > topic configuration? Reading the config on demand from ZK seems like the > simplest and least invasive option since this should not be a frequently > executed code path, but I could be missing something obvious. > > Neha Narkhede wrote: > >> It doesn't look like this is actually the case. The KafkaConfig is > passed from the KafkaServer to the KafkaController with no topic context, and > the controller does not appear to be integrated with the topic log > configuration logic in the TopicConfigManager/LogManager. > > To understand how the broker dynamically loads per topic configs, you can > look at TopicConfigManager. It registers a zookeeper listener on the topic > config change path and atomically switches the log config object to reflect > the new per topic config overrides. > > I haven't looked at the tests in detail, but are you introducing the per > topic config overrides the same way as TopicCommand does (by writing to the > correct zk path)? That is the only way it will be automatically reloaded by > all the brokers, including the controller >> are you introducing the per topic config overrides the same way as >> TopicCommand does (by writing to the correct zk path)? Yes, I'm using AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(...) in the tests. >> To understand how the broker dynamically loads per topic configs, you can >> look at TopicConfigManager. After reading through the code some more, I think I understand how the TopicConfigManager works. It is currently only integrated with the LogManager [1]. The LogManager functionality is isolated from the KafkaController and ReplicaFetcherThread, which only have access to the base server KafkaConfig. The KafkaConfig config is initialized when starting the broker and immutable. The dynamic configuration updates you're referring to are done in the LogManager's map of LogConfig instances. I didn't want to introduce an abstraction leak by passing the LogManager instance to the controller and replica fetcher threads and making this new replica configuration part of the LogConfig. I also wasn't sure whether it was worth the effort and extra complexity to enhance the TopicConfigManager to also automatically reload replica-related configuration in addition to log-related configuration (i.e., adding new ReplicaConfig and ReplicaManager classes similar to LogConfig and LogManager), since there is currently only a single configuration property that is not very frequently checked. [1] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaServer.scala#L98 - Andrew --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17537/#review33658 --- On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17537/ > --- > > (Updated Jan. 30, 2014, 7:45 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1028 > https://issues.apache.org/jira/browse/KAFKA-1028 > > > Repository: kafka > > > Description > --- > > KAFKA-1028: per topic configuration of preference for consistency over > availability > > > Diffs > - > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/common/NoReplicaOnlineException.scala > a1e12794978adf79020936c71259bbdabca8ee68 > core/src/main/scala/kafka/controller/KafkaController.scala > a
[jira] [Created] (KAFKA-6334) Minor documentation typo
Andrew Olson created KAFKA-6334: --- Summary: Minor documentation typo Key: KAFKA-6334 URL: https://issues.apache.org/jira/browse/KAFKA-6334 Project: Kafka Issue Type: Bug Components: documentation Affects Versions: 1.0.0 Reporter: Andrew Olson Priority: Trivial At [1]: {quote} 0.11.0 consumers support backwards compatibility with brokers 0.10.0 brokers and upward, so it is possible to upgrade the clients first before the brokers {quote} Specifically the "brokers 0.10.0 brokers" wording. [1] http://kafka.apache.org/documentation.html#upgrade_11_message_format -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-1379) Partition reassignment resets clock for time-based retention
[ https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson resolved KAFKA-1379. - Resolution: Fixed Assignee: Jiangjie Qin Fix Version/s: 0.10.1.0 Marking this bug as resolved in 0.10.1.0 based on the statement "The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment." in http://kafka.apache.org/documentation.html#upgrade_10_1_breaking > Partition reassignment resets clock for time-based retention > > > Key: KAFKA-1379 > URL: https://issues.apache.org/jira/browse/KAFKA-1379 > Project: Kafka > Issue Type: Bug > Components: log >Reporter: Joel Koshy >Assignee: Jiangjie Qin > Fix For: 0.10.1.0 > > > Since retention is driven off mod-times reassigned partitions will result in > data that has been on a leader to be retained for another full retention > cycle. E.g., if retention is seven days and you reassign partitions on the > sixth day then those partitions will remain on the replicas for another > seven days. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO
[ https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383107#comment-15383107 ] Andrew Olson commented on KAFKA-3924: - Reviewed, looks good to me. > Data loss due to halting when LEO is larger than leader's LEO > - > > Key: KAFKA-3924 > URL: https://issues.apache.org/jira/browse/KAFKA-3924 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 >Reporter: Maysam Yabandeh > > Currently the follower broker panics when its LEO is larger than its leader's > LEO, and assuming that this is an impossible state to reach, halts the > process to prevent any further damage. > {code} > if (leaderEndOffset < replica.logEndOffset.messageOffset) { > // Prior to truncating the follower's log, ensure that doing so is not > disallowed by the configuration for unclean leader election. > // This situation could only happen if the unclean election > configuration for a topic changes while a replica is down. Otherwise, > // we should never encounter this situation since a non-ISR leader > cannot be elected if disallowed by the broker configuration. > if (!LogConfig.fromProps(brokerConfig.originals, > AdminUtils.fetchEntityConfig(replicaMgr.zkUtils, > ConfigType.Topic, > topicAndPartition.topic)).uncleanLeaderElectionEnable) { > // Log a fatal error and shutdown the broker to ensure that data loss > does not unexpectedly occur. > fatal("...") > Runtime.getRuntime.halt(1) > } > {code} > Firstly this assumption is invalid and there are legitimate cases (examples > below) that this case could actually occur. Secondly halt results into the > broker losing its un-flushed data, and if multiple brokers halt > simultaneously there is a chance that both leader and followers of a > partition are among the halted brokers, which would result into permanent > data loss. > Given that this is a legit case, we suggest to replace it with a graceful > shutdown to avoid propagating data loss to the entire cluster. > Details: > One legit case that this could actually occur is when a troubled broker > shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In > this case the broker has lost some data but the controller cannot still > elects the others as the leader. If the crashed broker comes back up, the > controller elects it as the leader, and as a result all other brokers who are > now following it halt since they have LEOs larger than that of shrunk topics > in the restarted broker. We actually had a case that bringing up a crashed > broker simultaneously took down the entire cluster and as explained above > this could result into data loss. > The other legit case is when multiple brokers ungracefully shutdown at the > same time. In this case both of the leader and the followers lose their > un-flushed data but one of them has HW larger than the other. Controller > elects the one who comes back up sooner as the leader and if its LEO is less > than its future follower, the follower will halt (and probably lose more > data). Simultaneous ungrateful shutdown could happen due to hardware issue > (e.g., rack power failure), operator errors, or software issue (e.g., the > case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes > simultaneous halts in multiple brokers) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect
[ https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3971: Summary: Consumers drop from coordinator and cannot reconnect (was: Consumers drop from coordinator and cannot reconnet) > Consumers drop from coordinator and cannot reconnect > > > Key: KAFKA-3971 > URL: https://issues.apache.org/jira/browse/KAFKA-3971 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: version 0.9.0.1 >Reporter: Lei Wang > Attachments: KAFKA-3971.txt > > > From time to time, we're creating new topics, and all consumers will pickup > those new topics. When starting to consume from these new topics, we often > see some of random consumers cannot connect to the coordinator. The log will > be flushed with the following log message tens of thousands every second: > {noformat} > 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > {noformat} > the servers seem working fine, and other consumers are also happy. > from the log, looks like it's retrying multiple times every millisecond but > all failing. > the same process are consuming from many topics, some of them are still > working well, but those random topics will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect
[ https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415806#comment-15415806 ] Andrew Olson commented on KAFKA-3971: - How many partitions per topic do you have? > Consumers drop from coordinator and cannot reconnect > > > Key: KAFKA-3971 > URL: https://issues.apache.org/jira/browse/KAFKA-3971 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: version 0.9.0.1 >Reporter: Lei Wang > Attachments: KAFKA-3971.txt > > > From time to time, we're creating new topics, and all consumers will pickup > those new topics. When starting to consume from these new topics, we often > see some of random consumers cannot connect to the coordinator. The log will > be flushed with the following log message tens of thousands every second: > {noformat} > 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > {noformat} > the servers seem working fine, and other consumers are also happy. > from the log, looks like it's retrying multiple times every millisecond but > all failing. > the same process are consuming from many topics, some of them are still > working well, but those random topics will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect
[ https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415838#comment-15415838 ] Andrew Olson commented on KAFKA-3971: - Even at 1 partition, if a single process is consuming from that many topics its thread count would be very high - a separate fetcher thread started for each partition being consumed. So you may be at risk of running into a "max user processes" sort of ulimit or out of memory, especially if there is a substantial amount of data in each topic when the consumer is started, as each thread can read up to fetch.message.max.bytes worth of data. > Consumers drop from coordinator and cannot reconnect > > > Key: KAFKA-3971 > URL: https://issues.apache.org/jira/browse/KAFKA-3971 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: version 0.9.0.1 >Reporter: Lei Wang > Attachments: KAFKA-3971.txt > > > From time to time, we're creating new topics, and all consumers will pickup > those new topics. When starting to consume from these new topics, we often > see some of random consumers cannot connect to the coordinator. The log will > be flushed with the following log message tens of thousands every second: > {noformat} > 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the > coordinator 2147483645 dead. > {noformat} > the servers seem working fine, and other consumers are also happy. > from the log, looks like it's retrying multiple times every millisecond but > all failing. > the same process are consuming from many topics, some of them are still > working well, but those random topics will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722341#comment-15722341 ] Andrew Olson commented on KAFKA-3228: - Since the exception message that we saw has been removed from the code, I agree that this should be marked resolved. The bug description for KAFKA-4214 makes sense for applicability to this scenario as well. > Partition reassignment failure for brokers freshly added to cluster > --- > > Key: KAFKA-3228 > URL: https://issues.apache.org/jira/browse/KAFKA-3228 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8.2.1 >Reporter: Andrew Olson >Assignee: Neha Narkhede > Fix For: 0.10.1.0 > > > After adding about new 20 brokers to double the size of an existing > production Kafka deployment, when attempting to rebalance partitions we were > initially unable to reassign any partitions to 5 of the 20. There was no > problem with the other 15. The controller broker logged error messages like: > {noformat} > ERROR kafka.controller.KafkaController: [Controller 19]: Error completing > reassignment of partition [TOPIC-NAME,2] > kafka.common.KafkaException: Only 4,33 replicas out of the new set of > replicas 4,34,33 for partition [TOPIC-NAME,2] > to be reassigned are alive. Failing partition reassignment > at > kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) > at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {noformat} > We reattempted the reassignment to one of these new brokers, with the same > result. > We also saw these messages in the controller's log. There was a "Broken pipe" > error for each of the new brokers. > {noformat} > 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.write(IOUtil.java:148) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) > at java.nio.channels.SocketChannel.write(SocketChannel.java:502) > at > kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) > at kafka.network.Send$class.writeCompletely(Transmission.scala:75) > at > kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) > at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > {noformat} > {noformat} > WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to > broker id:34... > Reconnecting to broker. > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at kafka.utils.Utils$.read(Utils.scala:381) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54
[jira] [Commented] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722343#comment-15722343 ] Andrew Olson commented on KAFKA-3228: - Added fix version and duplicate link. > Partition reassignment failure for brokers freshly added to cluster > --- > > Key: KAFKA-3228 > URL: https://issues.apache.org/jira/browse/KAFKA-3228 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8.2.1 >Reporter: Andrew Olson >Assignee: Neha Narkhede > Fix For: 0.10.1.0 > > > After adding about new 20 brokers to double the size of an existing > production Kafka deployment, when attempting to rebalance partitions we were > initially unable to reassign any partitions to 5 of the 20. There was no > problem with the other 15. The controller broker logged error messages like: > {noformat} > ERROR kafka.controller.KafkaController: [Controller 19]: Error completing > reassignment of partition [TOPIC-NAME,2] > kafka.common.KafkaException: Only 4,33 replicas out of the new set of > replicas 4,34,33 for partition [TOPIC-NAME,2] > to be reassigned are alive. Failing partition reassignment > at > kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) > at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {noformat} > We reattempted the reassignment to one of these new brokers, with the same > result. > We also saw these messages in the controller's log. There was a "Broken pipe" > error for each of the new brokers. > {noformat} > 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.write(IOUtil.java:148) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) > at java.nio.channels.SocketChannel.write(SocketChannel.java:502) > at > kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) > at kafka.network.Send$class.writeCompletely(Transmission.scala:75) > at > kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) > at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > {noformat} > {noformat} > WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to > broker id:34... > Reconnecting to broker. > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at kafka.utils.Utils$.read(Utils.scala:381) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Receive$class.readCompletely(Transmission.scala:56) > at > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3228: Fix Version/s: 0.10.1.0 > Partition reassignment failure for brokers freshly added to cluster > --- > > Key: KAFKA-3228 > URL: https://issues.apache.org/jira/browse/KAFKA-3228 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8.2.1 >Reporter: Andrew Olson >Assignee: Neha Narkhede > Fix For: 0.10.1.0 > > > After adding about new 20 brokers to double the size of an existing > production Kafka deployment, when attempting to rebalance partitions we were > initially unable to reassign any partitions to 5 of the 20. There was no > problem with the other 15. The controller broker logged error messages like: > {noformat} > ERROR kafka.controller.KafkaController: [Controller 19]: Error completing > reassignment of partition [TOPIC-NAME,2] > kafka.common.KafkaException: Only 4,33 replicas out of the new set of > replicas 4,34,33 for partition [TOPIC-NAME,2] > to be reassigned are alive. Failing partition reassignment > at > kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) > at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {noformat} > We reattempted the reassignment to one of these new brokers, with the same > result. > We also saw these messages in the controller's log. There was a "Broken pipe" > error for each of the new brokers. > {noformat} > 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.write(IOUtil.java:148) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) > at java.nio.channels.SocketChannel.write(SocketChannel.java:502) > at > kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) > at kafka.network.Send$class.writeCompletely(Transmission.scala:75) > at > kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) > at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > {noformat} > {noformat} > WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to > broker id:34... > Reconnecting to broker. > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at kafka.utils.Utils$.read(Utils.scala:381) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Receive$class.readCompletely(Transmission.scala:56) > at > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) > at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) &
[jira] [Commented] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes
[ https://issues.apache.org/jira/browse/KAFKA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890540#comment-15890540 ] Andrew Olson commented on KAFKA-4092: - Note that this Jira is being reverted in 0.10.2.1 (see KAFKA-4788 for details). > retention.bytes should not be allowed to be less than segment.bytes > --- > > Key: KAFKA-4092 > URL: https://issues.apache.org/jira/browse/KAFKA-4092 > Project: Kafka > Issue Type: Improvement > Components: log >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Minor > Fix For: 0.10.2.0 > > > Right now retention.bytes can be as small as the user wants but it doesn't > really get acted on for the active segment if retention.bytes is smaller than > segment.bytes. We shouldn't allow retention.bytes to be less than > segment.bytes and validate that at startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped
Andrew Olson created KAFKA-4599: --- Summary: KafkaConsumer encounters SchemaException when Kafka broker stopped Key: KAFKA-4599 URL: https://issues.apache.org/jira/browse/KAFKA-4599 Project: Kafka Issue Type: Bug Components: consumer Reporter: Andrew Olson We recently observed an issue in production that can apparently occur a small percentage of the time when a Kafka broker is stopped. We're using version 0.9.0.1 for all brokers and clients. During a recent episode, 3 KafkaConsumer instances (out of approximately 100) ran into the following SchemaException within a few seconds of instructing the broker to shutdown. {noformat} 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'responses': Error reading array of size 2774863, only 62 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) {noformat} The exception message was slightly different for one consumer, {{Error reading field 'responses': Error reading array of size 2774863, only 260 bytes available}} The exception was not caught and caused the Storm Executor thread to restart, so it's not clear if it would have been transient or fatal for the KafkaConsumer. Here are the initial broker shutdown logs, {noformat} 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], shutting down 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Shutting down 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Stopped 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Shutdown completed 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-3-30], Shutting down 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], Controlled shutdown succeeded 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on Broker 4], Shutting down 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on Broker 4], Shutdown completed {noformat} We've found one very similar reported occurrence, http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped
[ https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802307#comment-15802307 ] Andrew Olson commented on KAFKA-4599: - Another recent similar reported occurrence from the mailing list, http://mail-archives.apache.org/mod_mbox/kafka-users/201612.mbox/%3CBY2PR02MB282F72F26267CAE415F2BA6D69B0%40BY2PR02MB282.namprd02.prod.outlook.com%3E Could the consumer code possibly be missing logic for handling a rare broker response? (For example like the issue fixed by https://github.com/apache/kafka/pull/2070) > KafkaConsumer encounters SchemaException when Kafka broker stopped > -- > > Key: KAFKA-4599 > URL: https://issues.apache.org/jira/browse/KAFKA-4599 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: Andrew Olson > > We recently observed an issue in production that can apparently occur a small > percentage of the time when a Kafka broker is stopped. We're using version > 0.9.0.1 for all brokers and clients. > During a recent episode, 3 KafkaConsumer instances (out of approximately 100) > ran into the following SchemaException within a few seconds of instructing > the broker to shutdown. > {noformat} > 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: > Error reading field 'responses': Error reading array of size 2774863, only 62 > bytes available > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > {noformat} > The exception message was slightly different for one consumer, > {{Error reading field 'responses': Error reading array of size 2774863, only > 260 bytes available}} > The exception was not caught and caused the Storm Executor thread to restart, > so it's not clear if it would have been transient or fatal for the > KafkaConsumer. > Here are the initial broker shutdown logs, > {noformat} > 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], > shutting down > 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutting down > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Stopped > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutdown completed > 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-3-30], Shutting down > 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], > Controlled shutdown succeeded > 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutting down > 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutdown completed > {noformat} > We've found one very similar reported occurrence, > http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped
[ https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804819#comment-15804819 ] Andrew Olson commented on KAFKA-4599: - We are not using either of those. We have implemented our own Storm spout that manages a KafkaConsumer and committing offsets. We're on version 0.10.0 for Storm and 0.9.0.1 for the Kafka client. > KafkaConsumer encounters SchemaException when Kafka broker stopped > -- > > Key: KAFKA-4599 > URL: https://issues.apache.org/jira/browse/KAFKA-4599 > Project: Kafka > Issue Type: Bug > Components: consumer >Reporter: Andrew Olson > > We recently observed an issue in production that can apparently occur a small > percentage of the time when a Kafka broker is stopped. We're using version > 0.9.0.1 for all brokers and clients. > During a recent episode, 3 KafkaConsumer instances (out of approximately 100) > ran into the following SchemaException within a few seconds of instructing > the broker to shutdown. > {noformat} > 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: > Error reading field 'responses': Error reading array of size 2774863, only 62 > bytes available > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > {noformat} > The exception message was slightly different for one consumer, > {{Error reading field 'responses': Error reading array of size 2774863, only > 260 bytes available}} > The exception was not caught and caused the Storm Executor thread to restart, > so it's not clear if it would have been transient or fatal for the > KafkaConsumer. > Here are the initial broker shutdown logs, > {noformat} > 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], > shutting down > 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutting down > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Stopped > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutdown completed > 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-3-30], Shutting down > 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], > Controlled shutdown succeeded > 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutting down > 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutdown completed > {noformat} > We've found one very similar reported occurrence, > http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1379) Partition reassignment resets clock for time-based retention
[ https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832155#comment-15832155 ] Andrew Olson commented on KAFKA-1379: - [~jjkoshy] / [~becket_qin] should this Jira now be closed as a duplicate of KAFKA-3163? https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-Enforcetimebasedlogretention > Partition reassignment resets clock for time-based retention > > > Key: KAFKA-1379 > URL: https://issues.apache.org/jira/browse/KAFKA-1379 > Project: Kafka > Issue Type: Bug >Reporter: Joel Koshy > > Since retention is driven off mod-times reassigned partitions will result in > data that has been on a leader to be retained for another full retention > cycle. E.g., if retention is seven days and you reassign partitions on the > sixth day then those partitions will remain on the replicas for another > seven days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1379) Partition reassignment resets clock for time-based retention
[ https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-1379: Component/s: log > Partition reassignment resets clock for time-based retention > > > Key: KAFKA-1379 > URL: https://issues.apache.org/jira/browse/KAFKA-1379 > Project: Kafka > Issue Type: Bug > Components: log >Reporter: Joel Koshy > > Since retention is driven off mod-times reassigned partitions will result in > data that has been on a leader to be retained for another full retention > cycle. E.g., if retention is seven days and you reassign partitions on the > sixth day then those partitions will remain on the replicas for another > seven days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2172) Round-robin partition assignment strategy too restrictive
[ https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson resolved KAFKA-2172. - Resolution: Fixed Assignee: Andrew Olson Fix Version/s: 0.10.2.0 Fixed for new consumer in 0.9.0.0 and old consumer in 0.10.2.0, marking as resolved and setting fix version. > Round-robin partition assignment strategy too restrictive > - > > Key: KAFKA-2172 > URL: https://issues.apache.org/jira/browse/KAFKA-2172 > Project: Kafka > Issue Type: Bug >Reporter: Jason Rosenberg > Assignee: Andrew Olson > Fix For: 0.10.2.0 > > > The round-ropin partition assignment strategy, was introduced for the > high-level consumer, starting with 0.8.2.1. This appears to be a very > attractive feature, but it has an unfortunate restriction, which prevents it > from being easily utilized. That is that it requires all consumers in the > consumer group have identical topic regex selectors, and that they have the > same number of consumer threads. > It turns out this is not always the case for our deployments. It's not > unusual to run multiple consumers within a single process (with different > topic selectors), or we might have multiple processes dedicated for different > topic subsets. Agreed, we could change these to have separate group ids for > each sub topic selector (but unfortunately, that's easier said than done). > In several cases, we do at least have separate client.ids set for each > sub-consumer, so it would be incrementally better if we could at least loosen > the requirement such that each set of topics selected by a groupid/clientid > pair are the same. > But, if we want to do a rolling restart for a new version of a consumer > config, the cluster will likely be in a state where it's not possible to have > a single config until the full rolling restart completes across all nodes. > This results in a consumer outage while the rolling restart is happening. > Finally, it's especially problematic if we want to canary a new version for a > period before rolling to the whole cluster. > I'm not sure why this restriction should exist (as it obviously does not > exist for the 'range' assignment strategy). It seems it could be made to > work reasonably well with heterogenous topic selection and heterogenous > thread counts. The documentation states that "The round-robin partition > assignor lays out all the available partitions and all the available consumer > threads. It then proceeds to do a round-robin assignment from partition to > consumer thread." > If the assignor can "lay out all the available partitions and all the > available consumer threads", it should be able to uniformly assign partitions > to the available threads. In each case, if a thread belongs to a consumer > that doesn't have that partition selected, just move to the next available > thread that does have the selection, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1379) Partition reassignment resets clock for time-based retention
[ https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864128#comment-15864128 ] Andrew Olson commented on KAFKA-1379: - [~hachikuji] Jason, could you confirm if this bug has been fixed? > Partition reassignment resets clock for time-based retention > > > Key: KAFKA-1379 > URL: https://issues.apache.org/jira/browse/KAFKA-1379 > Project: Kafka > Issue Type: Bug > Components: log >Reporter: Joel Koshy > > Since retention is driven off mod-times reassigned partitions will result in > data that has been on a leader to be retained for another full retention > cycle. E.g., if retention is seven days and you reassign partitions on the > sixth day then those partitions will remain on the replicas for another > seven days. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-1379) Partition reassignment resets clock for time-based retention
[ https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864128#comment-15864128 ] Andrew Olson edited comment on KAFKA-1379 at 2/13/17 6:43 PM: -- [~hachikuji] Jason, could you confirm if this bug has been fixed? According to http://kafka.apache.org/documentation.html#upgrade_10_1_breaking it appears so. was (Author: noslowerdna): [~hachikuji] Jason, could you confirm if this bug has been fixed? > Partition reassignment resets clock for time-based retention > > > Key: KAFKA-1379 > URL: https://issues.apache.org/jira/browse/KAFKA-1379 > Project: Kafka > Issue Type: Bug > Components: log >Reporter: Joel Koshy > > Since retention is driven off mod-times reassigned partitions will result in > data that has been on a leader to be retained for another full retention > cycle. E.g., if retention is seven days and you reassign partitions on the > sixth day then those partitions will remain on the replicas for another > seven days. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1524) Implement transactional producer
[ https://issues.apache.org/jira/browse/KAFKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873783#comment-15873783 ] Andrew Olson commented on KAFKA-1524: - Can this Jira be closed as obsolete? It appears to have been superseded by the design in https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging > Implement transactional producer > > > Key: KAFKA-1524 > URL: https://issues.apache.org/jira/browse/KAFKA-1524 > Project: Kafka > Issue Type: New Feature >Reporter: Joel Koshy >Assignee: Raul Castro Fernandez > Labels: transactions > Attachments: KAFKA-1524_2014-08-18_09:39:34.patch, > KAFKA-1524_2014-08-20_09:14:59.patch, KAFKA-1524.patch, KAFKA-1524.patch, > KAFKA-1524.patch > > > Implement the basic transactional producer functionality as outlined in > https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka > The scope of this jira is basic functionality (i.e., to be able to begin and > commit or abort a transaction) without the failure scenarios. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-1524) Implement transactional producer
[ https://issues.apache.org/jira/browse/KAFKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873783#comment-15873783 ] Andrew Olson edited comment on KAFKA-1524 at 2/19/17 6:16 PM: -- Can this Jira be closed as obsolete? It appears to have been superseded by the design in https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging Same comment applies to several other open Jiras with the "transactions" label. was (Author: noslowerdna): Can this Jira be closed as obsolete? It appears to have been superseded by the design in https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging > Implement transactional producer > > > Key: KAFKA-1524 > URL: https://issues.apache.org/jira/browse/KAFKA-1524 > Project: Kafka > Issue Type: New Feature >Reporter: Joel Koshy >Assignee: Raul Castro Fernandez > Labels: transactions > Attachments: KAFKA-1524_2014-08-18_09:39:34.patch, > KAFKA-1524_2014-08-20_09:14:59.patch, KAFKA-1524.patch, KAFKA-1524.patch, > KAFKA-1524.patch > > > Implement the basic transactional producer functionality as outlined in > https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka > The scope of this jira is basic functionality (i.e., to be able to begin and > commit or abort a transaction) without the failure scenarios. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-3866) KerberosLogin refresh time bug and other improvements
[ https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878152#comment-15878152 ] Andrew Olson commented on KAFKA-3866: - We saw the following error when setting a very low expiration time (-maxlife "30 seconds" and -maxrenewlife "90 seconds"), {noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: exiting refresh thread. Check clock sync between this host and KDC - (KDC's clock is likely ahead of this host). Manual intervention will be required for this client to successfully authenticate. Exiting refresh thread. (org.apache.kafka.common.security.kerberos.KerberosLogin){noformat} Would this change fix that scenario? > KerberosLogin refresh time bug and other improvements > - > > Key: KAFKA-3866 > URL: https://issues.apache.org/jira/browse/KAFKA-3866 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.0.0 >Reporter: Ismael Juma >Assignee: Ismael Juma > > ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is > also present in our KerberosLogin class. While looking at the code, I found a > number of things that could be improved. More details in the PR. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-3866) KerberosLogin refresh time bug and other improvements
[ https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878152#comment-15878152 ] Andrew Olson edited comment on KAFKA-3866 at 2/22/17 1:06 PM: -- We saw the following error when setting a very low expiration time (-maxlife "30 seconds" and -maxrenewlife "90 seconds"), {noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: exiting refresh thread. Check clock sync between this host and KDC - (KDC's clock is likely ahead of this host). Manual intervention will be required for this client to successfully authenticate. Exiting refresh thread. (org.apache.kafka.common.security.kerberos.KerberosLogin){noformat} Would this change fix that scenario? was (Author: noslowerdna): We saw the following error when setting a very low expiration time (-maxlife "30 seconds" and -maxrenewlife "90 seconds"), {noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: exiting refresh thread. Check clock sync between this host and KDC - (KDC's clock is likely ahead of this host). Manual intervention will be required for this client to successfully authenticate. Exiting refresh thread. (org.apache.kafka.common.security.kerberos.KerberosLogin){noformat} Would this change fix that scenario? > KerberosLogin refresh time bug and other improvements > - > > Key: KAFKA-3866 > URL: https://issues.apache.org/jira/browse/KAFKA-3866 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.0.0 >Reporter: Ismael Juma >Assignee: Ismael Juma > > ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is > also present in our KerberosLogin class. While looking at the code, I found a > number of things that could be improved. More details in the PR. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
Andrew Olson created KAFKA-3228: --- Summary: Partition reassignment failure for brokers freshly added to cluster Key: KAFKA-3228 URL: https://issues.apache.org/jira/browse/KAFKA-3228 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8.2.1 Reporter: Andrew Olson Assignee: Neha Narkhede After adding about new 20 brokers to double the size of an existing production Kafka deployment, when attempting to rebalance partitions we were initially unable to reassign any partitions to 5 of the 20. The controller broker logged error messages like: {noformat} ERROR kafka.controller.KafkaController: [Controller 19]: Error completing reassignment of partition [TOPIC-NAME,2] kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 4,34,33 for partition [TOPIC-NAME,2] to be reassigned are alive. Failing partition reassignment at kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) {noformat} We reattempted the reassignment to one of these new brokers, with the same result. We also saw these messages in the controller's log. There was a "Broken pipe" error for each of the new brokers. {noformat} 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to broker id:34... Reconnecting to broker. java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} INFO kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 connected to id:34... for sending state change requests {noformat} There were no error messages in the new broker log files, just the normal startup logs. A jstack did not reveal anything unusual with the threads, and using netstat the network connection
[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3228: Description: After adding about new 20 brokers to double the size of an existing production Kafka deployment, when attempting to rebalance partitions we were initially unable to reassign any partitions to 5 of the 20. There was no problem with the other 15. The controller broker logged error messages like: {noformat} ERROR kafka.controller.KafkaController: [Controller 19]: Error completing reassignment of partition [TOPIC-NAME,2] kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 4,34,33 for partition [TOPIC-NAME,2] to be reassigned are alive. Failing partition reassignment at kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) {noformat} We reattempted the reassignment to one of these new brokers, with the same result. We also saw these messages in the controller's log. There was a "Broken pipe" error for each of the new brokers. {noformat} 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to broker id:34... Reconnecting to broker. java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} INFO kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 connected to id:34... for sending state change requests {noformat} There were no error messages in the new broker log files, just the normal startup logs. A jstack did not reveal anything unusual with the threads, and using netstat the network connections looked normal. We're running version 0.8.2.1. The new brokers were simultaneously started using a broadcast-style command. However we also had the same issue with a different Kafka cluster after
[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster
[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3228: Description: After adding about new 20 brokers to double the size of an existing production Kafka deployment, when attempting to rebalance partitions we were initially unable to reassign any partitions to 5 of the 20. There was no problem with the other 15. The controller broker logged error messages like: {noformat} ERROR kafka.controller.KafkaController: [Controller 19]: Error completing reassignment of partition [TOPIC-NAME,2] kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 4,34,33 for partition [TOPIC-NAME,2] to be reassigned are alive. Failing partition reassignment at kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) {noformat} We reattempted the reassignment to one of these new brokers, with the same result. We also saw these messages in the controller's log. There was a "Broken pipe" error for each of the new brokers. {noformat} 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to broker id:34... Reconnecting to broker. java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} INFO kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 connected to id:34... for sending state change requests {noformat} There were no error messages in the new broker log files, just the normal startup logs. A jstack did not reveal anything unusual with the threads, and using netstat the network connections looked normal. We're running version 0.8.2.1. The new brokers were simultaneously started using a broadcast-style command. However we also had the same issue with a different Kafka cluster after
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163413#comment-15163413 ] Andrew Olson commented on KAFKA-493: [~cosmin.marginean] You might try upgrading to 0.9.0.1 which was just released last week. It has a couple of bug fixes (KAFKA-3159 and KAFKA-3003) that may be applicable. > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.10.1.0 > > Attachments: Kafka-2014-11-10.snapshot.zip, Kafka-sampling1.zip, > Kafka-sampling2.zip, Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, > Kafka-trace3.zip, backtraces.txt, stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on > broker 1, ", group="main") > THREAD START (obj=53ae, id = 200028, name="SIGINT handler", > group="system") > THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main") > THREAD START (obj=574b, id = 200030, name="Thread-1", group="main") > THR
[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167424#comment-15167424 ] Andrew Olson commented on KAFKA-2435: - [~hachikuji] The pull request for this (https://github.com/apache/kafka/pull/146) has been rebased. Please let me know if there are any concerns that need to be addressed for this enhancement to be accepted. > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167427#comment-15167427 ] Andrew Olson commented on KAFKA-2434: - [~junrao] The pull request for this (https://github.com/apache/kafka/pull/145) has been rebased. Please let me know if there are any concerns that need to be addressed for this patch to be accepted. > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2172) Round-robin partition assignment strategy too restrictive
[ https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167431#comment-15167431 ] Andrew Olson commented on KAFKA-2172: - Note that the code changed in KAFKA-2196 was relocated to the new consumer library (+ ported from Scala -> Java) in KAFKA-2464. > Round-robin partition assignment strategy too restrictive > - > > Key: KAFKA-2172 > URL: https://issues.apache.org/jira/browse/KAFKA-2172 > Project: Kafka > Issue Type: Bug >Reporter: Jason Rosenberg > > The round-ropin partition assignment strategy, was introduced for the > high-level consumer, starting with 0.8.2.1. This appears to be a very > attractive feature, but it has an unfortunate restriction, which prevents it > from being easily utilized. That is that it requires all consumers in the > consumer group have identical topic regex selectors, and that they have the > same number of consumer threads. > It turns out this is not always the case for our deployments. It's not > unusual to run multiple consumers within a single process (with different > topic selectors), or we might have multiple processes dedicated for different > topic subsets. Agreed, we could change these to have separate group ids for > each sub topic selector (but unfortunately, that's easier said than done). > In several cases, we do at least have separate client.ids set for each > sub-consumer, so it would be incrementally better if we could at least loosen > the requirement such that each set of topics selected by a groupid/clientid > pair are the same. > But, if we want to do a rolling restart for a new version of a consumer > config, the cluster will likely be in a state where it's not possible to have > a single config until the full rolling restart completes across all nodes. > This results in a consumer outage while the rolling restart is happening. > Finally, it's especially problematic if we want to canary a new version for a > period before rolling to the whole cluster. > I'm not sure why this restriction should exist (as it obviously does not > exist for the 'range' assignment strategy). It seems it could be made to > work reasonably well with heterogenous topic selection and heterogenous > thread counts. The documentation states that "The round-robin partition > assignor lays out all the available partitions and all the available consumer > threads. It then proceeds to do a round-robin assignment from partition to > consumer thread." > If the assignor can "lay out all the available partitions and all the > available consumer threads", it should be able to uniformly assign partitions > to the available threads. In each case, if a thread belongs to a consumer > that doesn't have that partition selected, just move to the next available > thread that does have the selection, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
Andrew Olson created KAFKA-3297: --- Summary: More optimally balanced partition assignment strategy (new consumer) Key: KAFKA-3297 URL: https://issues.apache.org/jira/browse/KAFKA-3297 Project: Kafka Issue Type: Improvement Reporter: Andrew Olson Assignee: Andrew Olson While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3297: Description: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the new consumer. was: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson >Assignee: Andrew Olson > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2435: Description: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the original high-level consumer. For the new consumer, see KAFKA-3297. was: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the original high-level consumer. > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson >Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the original high-level consumer. For the new consumer, > see KAFKA-3297. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2435: Description: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the original high-level consumer. was: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson >Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the original high-level consumer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3297: Description: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the new consumer. For the original high-level consumer, see KAFKA-2435. was: While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. This JIRA addresses the new consumer. > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson >Assignee: Andrew Olson > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. For the original high-level consumer, > see KAFKA-2435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3297 started by Andrew Olson. --- > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. For the original high-level consumer, > see KAFKA-2435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3298) Document unclean.leader.election.enable as a valid topic-level config
Andrew Olson created KAFKA-3298: --- Summary: Document unclean.leader.election.enable as a valid topic-level config Key: KAFKA-3298 URL: https://issues.apache.org/jira/browse/KAFKA-3298 Project: Kafka Issue Type: Bug Components: website Reporter: Andrew Olson Priority: Minor The http://kafka.apache.org/documentation.html#topic-config section does not currently include {{unclean.leader.election.enable}}. This is a valid topic-level configuration property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3298) Document unclean.leader.election.enable as a valid topic-level config
[ https://issues.apache.org/jira/browse/KAFKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3298: Description: The http://kafka.apache.org/documentation.html#topic-config section does not currently include {{unclean.leader.election.enable}}. That is a valid topic-level configuration property as demonstrated by this [1] test. [1] https://github.com/apache/kafka/blob/0.9.0.1/core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala#L127 was:The http://kafka.apache.org/documentation.html#topic-config section does not currently include {{unclean.leader.election.enable}}. This is a valid topic-level configuration property. > Document unclean.leader.election.enable as a valid topic-level config > - > > Key: KAFKA-3298 > URL: https://issues.apache.org/jira/browse/KAFKA-3298 > Project: Kafka > Issue Type: Bug > Components: website > Reporter: Andrew Olson >Priority: Minor > > The http://kafka.apache.org/documentation.html#topic-config section does not > currently include {{unclean.leader.election.enable}}. That is a valid > topic-level configuration property as demonstrated by this [1] test. > [1] > https://github.com/apache/kafka/blob/0.9.0.1/core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala#L127 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3297: Status: Patch Available (was: In Progress) > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. For the original high-level consumer, > see KAFKA-2435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174235#comment-15174235 ] Andrew Olson commented on KAFKA-2435: - KIP link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-49+-+Fair+Partition+Assignment+Strategy > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the original high-level consumer. For the new consumer, > see KAFKA-3297. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174237#comment-15174237 ] Andrew Olson commented on KAFKA-3297: - KIP link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-49+-+Fair+Partition+Assignment+Strategy > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. For the original high-level consumer, > see KAFKA-2435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)
[ https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-3297: Fix Version/s: 0.10.0.0 > More optimally balanced partition assignment strategy (new consumer) > > > Key: KAFKA-3297 > URL: https://issues.apache.org/jira/browse/KAFKA-3297 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Fix For: 0.10.0.0 > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the new consumer. For the original high-level consumer, > see KAFKA-2435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2435: Fix Version/s: 0.10.0.0 > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Fix For: 0.10.0.0 > > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. > This JIRA addresses the original high-level consumer. For the new consumer, > see KAFKA-3297. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2434: Fix Version/s: 0.10.0.0 > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > Fix For: 0.10.0.0 > > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2196) remove roundrobin identical topic constraint in consumer coordinator
[ https://issues.apache.org/jira/browse/KAFKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680204#comment-14680204 ] Andrew Olson commented on KAFKA-2196: - Doesn't this [1] code also need to be updated? [1] https://github.com/apache/kafka/blob/0.8.2.1/core/src/main/scala/kafka/consumer/PartitionAssignor.scala#L78-82 > remove roundrobin identical topic constraint in consumer coordinator > > > Key: KAFKA-2196 > URL: https://issues.apache.org/jira/browse/KAFKA-2196 > Project: Kafka > Issue Type: Sub-task >Reporter: Onur Karaman >Assignee: Onur Karaman > Attachments: KAFKA-2196.patch > > > roundrobin doesn't need to make all consumers have identical topic > subscriptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson reassigned KAFKA-2434: --- Assignee: Andrew Olson > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
Andrew Olson created KAFKA-2434: --- Summary: remove roundrobin identical topic constraint in consumer coordinator (old API) Key: KAFKA-2434 URL: https://issues.apache.org/jira/browse/KAFKA-2434 Project: Kafka Issue Type: Sub-task Reporter: Andrew Olson The roundrobin strategy algorithm improvement made in KAFKA-2196 should be applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2196) remove roundrobin identical topic constraint in consumer coordinator
[ https://issues.apache.org/jira/browse/KAFKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697189#comment-14697189 ] Andrew Olson commented on KAFKA-2196: - > Doesn't this [1] code also need to be updated? Opened KAFKA-2434 for removing this constraint from the old consumer API. > remove roundrobin identical topic constraint in consumer coordinator > > > Key: KAFKA-2196 > URL: https://issues.apache.org/jira/browse/KAFKA-2196 > Project: Kafka > Issue Type: Sub-task >Reporter: Onur Karaman >Assignee: Onur Karaman > Attachments: KAFKA-2196.patch > > > roundrobin doesn't need to make all consumers have identical topic > subscriptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697192#comment-14697192 ] Andrew Olson commented on KAFKA-2434: - I will submit a patch for this change shortly. > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2435) More optimally balanced partition assignment strategy
Andrew Olson created KAFKA-2435: --- Summary: More optimally balanced partition assignment strategy Key: KAFKA-2435 URL: https://issues.apache.org/jira/browse/KAFKA-2435 Project: Kafka Issue Type: Improvement Reporter: Andrew Olson Assignee: Andrew Olson While the roundrobin partition assignment strategy is an improvement over the range strategy, when the consumer topic subscriptions are not identical (previously disallowed but will be possible as of KAFKA-2172) it can produce heavily skewed assignments. As suggested [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] it would be nice to have a strategy that attempts to assign an equal number of partitions to each consumer in a group, regardless of how similar their individual topic subscriptions are. We can accomplish this by tracking the number of partitions assigned to each consumer, and having the partition assignment loop assign each partition to a consumer interested in that topic with the least number of partitions assigned. Additionally, we can optimize the distribution fairness by adjusting the partition assignment order: * Topics with fewer consumers are assigned first. * In the event of a tie for least consumers, the topic with more partitions is assigned first. The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697264#comment-14697264 ] Andrew Olson commented on KAFKA-2434: - Created reviewboard https://reviews.apache.org/r/37480/diff/ against branch origin/trunk > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2434: Status: Patch Available (was: Open) > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2434: Attachment: KAFKA-2434.patch > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697266#comment-14697266 ] Andrew Olson commented on KAFKA-2435: - Created reviewboard https://reviews.apache.org/r/37481/diff/ against branch origin/trunk > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2435: Status: Patch Available (was: Open) > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2435: Attachment: KAFKA-2435.patch > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement > Reporter: Andrew Olson > Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2172) Round-robin partition assignment strategy too restrictive
[ https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697662#comment-14697662 ] Andrew Olson commented on KAFKA-2172: - [~jjkoshy] I've implemented a new assignment algorithm similar to what Bryan described above that appears to work reasonably well across a wide variety of scenarios - see KAFKA-2435. > Round-robin partition assignment strategy too restrictive > - > > Key: KAFKA-2172 > URL: https://issues.apache.org/jira/browse/KAFKA-2172 > Project: Kafka > Issue Type: Bug >Reporter: Jason Rosenberg > > The round-ropin partition assignment strategy, was introduced for the > high-level consumer, starting with 0.8.2.1. This appears to be a very > attractive feature, but it has an unfortunate restriction, which prevents it > from being easily utilized. That is that it requires all consumers in the > consumer group have identical topic regex selectors, and that they have the > same number of consumer threads. > It turns out this is not always the case for our deployments. It's not > unusual to run multiple consumers within a single process (with different > topic selectors), or we might have multiple processes dedicated for different > topic subsets. Agreed, we could change these to have separate group ids for > each sub topic selector (but unfortunately, that's easier said than done). > In several cases, we do at least have separate client.ids set for each > sub-consumer, so it would be incrementally better if we could at least loosen > the requirement such that each set of topics selected by a groupid/clientid > pair are the same. > But, if we want to do a rolling restart for a new version of a consumer > config, the cluster will likely be in a state where it's not possible to have > a single config until the full rolling restart completes across all nodes. > This results in a consumer outage while the rolling restart is happening. > Finally, it's especially problematic if we want to canary a new version for a > period before rolling to the whole cluster. > I'm not sure why this restriction should exist (as it obviously does not > exist for the 'range' assignment strategy). It seems it could be made to > work reasonably well with heterogenous topic selection and heterogenous > thread counts. The documentation states that "The round-robin partition > assignor lays out all the available partitions and all the available consumer > threads. It then proceeds to do a round-robin assignment from partition to > consumer thread." > If the assignor can "lay out all the available partitions and all the > available consumer threads", it should be able to uniformly assign partitions > to the available threads. In each case, if a thread belongs to a consumer > that doesn't have that partition selected, just move to the next available > thread that does have the selection, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy
[ https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699565#comment-14699565 ] Andrew Olson commented on KAFKA-2435: - [~becket_qin] Thanks for the info, I'll send a pull request. This new strategy specifically targets the case where the consumer topic subscriptions are not all identical, which I don't believe that KAFKA-2019 addresses -- it looks like it would probably reduce the potential for a large imbalance, but only if there is more than one thread per consumer. > More optimally balanced partition assignment strategy > - > > Key: KAFKA-2435 > URL: https://issues.apache.org/jira/browse/KAFKA-2435 > Project: Kafka > Issue Type: Improvement >Reporter: Andrew Olson >Assignee: Andrew Olson > Attachments: KAFKA-2435.patch > > > While the roundrobin partition assignment strategy is an improvement over the > range strategy, when the consumer topic subscriptions are not identical > (previously disallowed but will be possible as of KAFKA-2172) it can produce > heavily skewed assignments. As suggested > [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767] > it would be nice to have a strategy that attempts to assign an equal number > of partitions to each consumer in a group, regardless of how similar their > individual topic subscriptions are. We can accomplish this by tracking the > number of partitions assigned to each consumer, and having the partition > assignment loop assign each partition to a consumer interested in that topic > with the least number of partitions assigned. > Additionally, we can optimize the distribution fairness by adjusting the > partition assignment order: > * Topics with fewer consumers are assigned first. > * In the event of a tie for least consumers, the topic with more partitions > is assigned first. > The general idea behind these two rules is to keep the most flexible > assignment choices available as long as possible by starting with the most > constrained partitions/consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4152) Reduce severity level of metadata fetch failure logging for nonexistent topics
Andrew Olson created KAFKA-4152: --- Summary: Reduce severity level of metadata fetch failure logging for nonexistent topics Key: KAFKA-4152 URL: https://issues.apache.org/jira/browse/KAFKA-4152 Project: Kafka Issue Type: Improvement Components: clients Reporter: Andrew Olson If a consumer proactively subscribes to one or more topics that don't already exist, but are expected to exist in the near future, warnings are repeatedly logged by the NetworkClient throughout the consumer's lifetime until the topic eventually gets created like, {noformat} org.apache.kafka.clients.NetworkClient [WARN] Error while fetching metadata with correlation id 1 : {MY.NEW.TOPIC=UNKNOWN_TOPIC_OR_PARTITION, ANOTHER.NEW.TOPIC=UNKNOWN_TOPIC_OR_PARTITION} {noformat} The NetworkClient's warning logging code for metadata fetch failures is rather generic, but could potentially examine the reason to log at debug level for UNKNOWN_TOPIC_OR_PARTITION and warn for all others. As these warnings could be very valuable for troubleshooting in some situations a reasonable approach might be to remember the unknown topics that it has logged a warning for, and reduce the log level from warning to debug for future logging for the same topics for the same common cause of not (yet, presumably) existing, although that does introduce some undesirable complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError
[ https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531011#comment-15531011 ] Andrew Olson commented on KAFKA-3990: - I think this Jira can be closed as a duplicate of KAFKA-2512, version and magic byte verification should address this. We saw the same thing with the new Consumer when its bootstrap.servers was accidentally set to the host:port of a Kafka Offset Monitor (https://github.com/quantifind/KafkaOffsetMonitor) service. > Kafka New Producer may raise an OutOfMemoryError > > > Key: KAFKA-3990 > URL: https://issues.apache.org/jira/browse/KAFKA-3990 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 > Environment: Docker, Base image : CentOS > Java 8u77 > Marathon >Reporter: Brice Dutheil > Attachments: app-producer-config.log, kafka-broker-logs.zip > > > We are regularly seeing OOME errors on a kafka producer, we first saw : > {code} > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77] > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77] > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > ~[kafka-clients-0.9.0.1.jar:na] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77] > {code} > This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} > (see > https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93) > Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And > we are producing small messages 500B at most. > Also the error don't appear on the devlopment environment, in order to > identify the issue we tweaked the code to give us actual data of the > allocation size, we got this stack : > {code} > 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN > o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1' > 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN > o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : > NetworkReceive.readFromReadableChannel.receiveSize=1213486160 > java.lang.OutOfMemoryError: Java heap space > Dumping heap to /tmp/tomcat.hprof ... > Heap dump file created [69583827 bytes in 0.365 secs] > 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR > o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread > | producer-1: > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77] > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77] > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > ~[kafka-clients-0.9.0.1.jar:na] > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > ~[kafka-client
[jira] [Updated] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers
[ https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-4180: Summary: Shared authentication with multiple active Kafka producers/consumers (was: Shared authentification with multiple actives Kafka producers/consumers) > Shared authentication with multiple active Kafka producers/consumers > > > Key: KAFKA-4180 > URL: https://issues.apache.org/jira/browse/KAFKA-4180 > Project: Kafka > Issue Type: Bug > Components: producer , security >Affects Versions: 0.10.0.1 >Reporter: Guillaume Grossetie >Assignee: Mickael Maison > Labels: authentication, jaas, loginmodule, plain, producer, > sasl, user > > I'm using Kafka 0.10.0.1 with an SASL authentication on the client: > {code:title=kafka_client_jaas.conf|borderStyle=solid} > KafkaClient { > org.apache.kafka.common.security.plain.PlainLoginModule required > username="guillaume" > password="secret"; > }; > {code} > When using multiple Kafka producers the authentification is shared [1]. In > other words it's not currently possible to have multiple Kafka producers in a > JVM process. > Am I missing something ? How can I have multiple active Kafka producers with > different credentials ? > My use case is that I have an application that send messages to multiples > clusters (one cluster for logs, one cluster for metrics, one cluster for > business data). > [1] > https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4362) Consumer can fail after reassignment of the offsets topic partition
[ https://issues.apache.org/jira/browse/KAFKA-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628871#comment-15628871 ] Andrew Olson commented on KAFKA-4362: - Is there documentation that advises for a COORDINATOR_NOT_AVAILABLE error when committing offsets the KafkaConsumer should be closed, and a new instance created? Or is the expectation that any CommitFailedException should be handled in that way? We have seen some cases where a call to poll() before retrying a failed commit due to ILLEGAL_GENERATION or UNKNOWN_MEMBER_ID allows the consumer to recover. > Consumer can fail after reassignment of the offsets topic partition > --- > > Key: KAFKA-4362 > URL: https://issues.apache.org/jira/browse/KAFKA-4362 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 >Reporter: Joel Koshy >Assignee: Mayuresh Gharat > > When a consumer offsets topic partition reassignment completes, an offset > commit shows this: > {code} > java.lang.IllegalArgumentException: Message format version for partition 100 > not found > at > kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) > ~[kafka_2.10.jar:?] > at > kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) > ~[kafka_2.10.jar:?] > at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.4.jar:?] > at > kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632) > ~[kafka_2.10.jar:?] > at > ... > {code} > The issue is that the replica has been deleted so the > {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} throws this > exception instead which propagates as an unknown error. > Unfortunately consumers don't respond to this and will fail their offset > commits. > One workaround in the above situation is to bounce the cluster - the consumer > will be forced to rediscover the group coordinator. > (Incidentally, the message incorrectly prints the number of partitions > instead of the actual partition.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf
[ https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson resolved KAFKA-7631. - Resolution: Fixed Marking as resolved since I believe it is. > NullPointerException when SCRAM is allowed bu ScramLoginModule is not in > broker's jaas.conf > --- > > Key: KAFKA-7631 > URL: https://issues.apache.org/jira/browse/KAFKA-7631 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 2.0.0, 2.5.0 >Reporter: Andras Beni >Priority: Minor > Fix For: 2.7.0 > > Attachments: KAFKA-7631.patch > > > When user wants to use delegation tokens and lists {{SCRAM}} in > {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to > broker's JAAS configuration, a null pointer exception is thrown on broker > side and the connection is closed. > Meaningful error message should be logged and sent back to the client. > {code} > java.lang.NullPointerException > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376) > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489) > at org.apache.kafka.common.network.Selector.poll(Selector.java:427) > at kafka.network.Processor.poll(SocketServer.scala:679) > at kafka.network.Processor.run(SocketServer.scala:584) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-9233) Kafka consumer throws undocumented IllegalStateException
Andrew Olson created KAFKA-9233: --- Summary: Kafka consumer throws undocumented IllegalStateException Key: KAFKA-9233 URL: https://issues.apache.org/jira/browse/KAFKA-9233 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 2.3.0 Reporter: Andrew Olson If the provided collection of TopicPartition instances contains any duplicates, an IllegalStateException not documented in the javadoc is thrown by internal Java stream code when calling KafkaConsumer#beginningOffsets or KafkaConsumer#endOffsets. The stack trace looks like this, {noformat} java.lang.IllegalStateException: Duplicate key -2 at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) at java.util.HashMap.merge(HashMap.java:1254) at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.kafka.clients.consumer.internals.Fetcher.beginningOrEndOffset(Fetcher.java:555) at org.apache.kafka.clients.consumer.internals.Fetcher.beginningOffsets(Fetcher.java:542) at org.apache.kafka.clients.consumer.KafkaConsumer.beginningOffsets(KafkaConsumer.java:2054) at org.apache.kafka.clients.consumer.KafkaConsumer.beginningOffsets(KafkaConsumer.java:2031) {noformat} {noformat} java.lang.IllegalStateException: Duplicate key -1 at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) at java.util.HashMap.merge(HashMap.java:1254) at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.kafka.clients.consumer.internals.Fetcher.beginningOrEndOffset(Fetcher.java:559) at org.apache.kafka.clients.consumer.internals.Fetcher.endOffsets(Fetcher.java:550) at org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2109) at org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2081) {noformat} Looking at the code, it appears this may likely have been introduced by KAFKA-7831. The exception is not thrown in Kafka 2.2.1, with the duplicated TopicPartition values silently ignored. Either we should document this exception possibility (probably wrapping it with a Kafka exception class) indicating invalid client API usage, or restore the previous behavior where the duplicates were harmless. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-2114) Unable to change min.insync.replicas default
[ https://issues.apache.org/jira/browse/KAFKA-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737079#comment-14737079 ] Andrew Olson commented on KAFKA-2114: - Yes, that is correct. > Unable to change min.insync.replicas default > > > Key: KAFKA-2114 > URL: https://issues.apache.org/jira/browse/KAFKA-2114 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.2.0, 0.8.2.1 >Reporter: Bryan Baugher >Assignee: Gwen Shapira > Fix For: 0.8.3 > > Attachments: KAFKA-2114.patch > > > Following the comment here[1] I was unable to change the min.insync.replicas > default value. I tested this by setting up a 3 node cluster, wrote to a topic > with a replication factor of 3, using request.required.acks=-1 and setting > min.insync.replicas=2 on the broker's server.properties. I then shutdown 2 > brokers but I was still able to write successfully. Only after running the > alter topic command setting min.insync.replicas=2 on the topic did I see > write failures. > [1] - > http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCANZ-JHF71yqKE6%2BKKhWe2EGUJv6R3bTpoJnYck3u1-M35sobgg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2114) Unable to change min.insync.replicas default
[ https://issues.apache.org/jira/browse/KAFKA-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-2114: Affects Version/s: 0.8.2.0 0.8.2.1 > Unable to change min.insync.replicas default > > > Key: KAFKA-2114 > URL: https://issues.apache.org/jira/browse/KAFKA-2114 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.2.0, 0.8.2.1 >Reporter: Bryan Baugher >Assignee: Gwen Shapira > Fix For: 0.8.3 > > Attachments: KAFKA-2114.patch > > > Following the comment here[1] I was unable to change the min.insync.replicas > default value. I tested this by setting up a 3 node cluster, wrote to a topic > with a replication factor of 3, using request.required.acks=-1 and setting > min.insync.replicas=2 on the broker's server.properties. I then shutdown 2 > brokers but I was still able to write successfully. Only after running the > alter topic command setting min.insync.replicas=2 on the topic did I see > write failures. > [1] - > http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCANZ-JHF71yqKE6%2BKKhWe2EGUJv6R3bTpoJnYck3u1-M35sobgg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)
[ https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936850#comment-14936850 ] Andrew Olson commented on KAFKA-2434: - [~junrao] Yes, this was only intended for parity with the original bug-fix implementation in KAFKA-2196. I noticed the distribution uniformity deficiency as well and consequently also submitted a patch for a new "fair" algorithm - see KAFKA-2435. > remove roundrobin identical topic constraint in consumer coordinator (old API) > -- > > Key: KAFKA-2434 > URL: https://issues.apache.org/jira/browse/KAFKA-2434 > Project: Kafka > Issue Type: Sub-task >Reporter: Andrew Olson >Assignee: Andrew Olson > Attachments: KAFKA-2434.patch > > > The roundrobin strategy algorithm improvement made in KAFKA-2196 should be > applied to the original high-level consumer API as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2189) Snappy compression of message batches less efficient in 0.8.2.1
[ https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948989#comment-14948989 ] Andrew Olson commented on KAFKA-2189: - [~jbrosenb...@gmail.com] Yes your assumption is correct, this was only an efficiency/chunking issue and not any protocol incompatibility. > Snappy compression of message batches less efficient in 0.8.2.1 > --- > > Key: KAFKA-2189 > URL: https://issues.apache.org/jira/browse/KAFKA-2189 > Project: Kafka > Issue Type: Bug > Components: build, compression, log >Affects Versions: 0.8.2.1 >Reporter: Olson,Andrew >Assignee: Ismael Juma >Priority: Blocker > Labels: trivial > Fix For: 0.9.0.0, 0.8.2.2 > > Attachments: KAFKA-2189.patch > > > We are using snappy compression and noticed a fairly substantial increase > (about 2.25x) in log filesystem space consumption after upgrading a Kafka > cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages > being seemingly recompressed individually (or possibly with a much smaller > buffer or dictionary?) instead of as a batch as sent by producers. We > eventually tracked down the change in compression ratio/scope to this [1] > commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka > client version does not appear to be relevant as we can reproduce this with > both the 0.8.1.1 and 0.8.2.1 Producer. > Here are the log files from our troubleshooting that contain the same set of > 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last > commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue. > {noformat} > -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 > /var/kafka2/f9d9b-batch-1-0/.log > -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 > /var/kafka2/f9d9b-batch-10-0/.log > -rw-rw-r-- 1 kafka kafka 89645 May 12 11:45 > /var/kafka2/f9d9b-batch-100-0/.log > -rw-rw-r-- 1 kafka kafka 88279 May 12 11:45 > /var/kafka2/f9d9b-batch-1000-0/.log > -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 > /var/kafka2/f5ab8-batch-1-0/.log > -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 > /var/kafka2/f5ab8-batch-10-0/.log > -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 > /var/kafka2/f5ab8-batch-100-0/.log > -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 > /var/kafka2/f5ab8-batch-1000-0/.log > {noformat} > [1] > https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28 > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910490#comment-13910490 ] Andrew Olson commented on KAFKA-1028: - [~nehanarkhede], can you respond to my latest comment on the reviewboard? Thanks! > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-1028: Attachment: KAFKA-1028_2014-03-03_18:48:43.patch > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918842#comment-13918842 ] Andrew Olson commented on KAFKA-1028: - Updated reviewboard https://reviews.apache.org/r/17537/ against branch origin/trunk > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918871#comment-13918871 ] Andrew Olson commented on KAFKA-1028: - Patch has been updated with the suggested changes. Note that I made one additional change which was found to be necessary after rebasing against the latest trunk code (detected by integration test failures). It appears that the recent KAFKA-1235 changes result in a Leader of -1 (none) and empty ISR being saved for all partitions after all brokers in the cluster have been shutdown using a controlled shutdown. This essentially forces an unclean leader election to occur when a broker is subsequently restarted. If we have the configuration set to disallow unclean elections, then we have permanently blocked our ability to restore a leader. To address this, I've added logic that prevents the ISR from being updated to an empty set for any topics that do not allow unclean election. The last surviving member of the ISR will be preserved in Zookeeper in the event that a final, lone leader broker goes offline -- allowing this previous leader to resume its leader role when it comes back online. The partition leader is still updated to -1, which is not problematic. Please review lines 975-986 of KafkaController and let me know if this change makes sense. > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-1028: Attachment: KAFKA-1028_2014-03-17_09:39:05.patch > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937865#comment-13937865 ] Andrew Olson commented on KAFKA-1028: - Updated reviewboard https://reviews.apache.org/r/17537/ against branch origin/trunk > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Neha Narkhede > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946673#comment-13946673 ] Andrew Olson commented on KAFKA-1028: - [~jkreps]/[~nehanarkhede] Would it be possible for this to be included in the 0.8.2 release? > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen > Assignee: Andrew Olson > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946686#comment-13946686 ] Andrew Olson commented on KAFKA-1028: - Sounds good, thanks. > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen > Assignee: Andrew Olson > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948159#comment-13948159 ] Andrew Olson commented on KAFKA-1028: - The tests included with the patch do verify consistency and non-availability, but they just shutdown and restart brokers rather than simulating a temporary network partition. A few months ago I tried to get Jepsen+Kafka up and running, without much success. I agree that a system test would be good to have. I'll take a look at that framework to see how feasible it would be to add some tests for this. > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Andrew Olson > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1179) createMessageStreams() in javaapi.ZookeeperConsumerConnector does not throw
[ https://issues.apache.org/jira/browse/KAFKA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950899#comment-13950899 ] Andrew Olson commented on KAFKA-1179: - Using the Java API, calling createMessageStreams(...) multiple times using the same consumer connector instance appears to cause the consumer to hang in registerConsumerInZK(...), since a fresh timestamp value is generated for the consumer subscription. At a minimum, it should be clearly documented that this is not valid usage of the API. The following messages are continually logged. {code} kafka.utils.ZkUtils$ - I wrote this conflicted ephemeral node [{"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019252998"}] at /consumers/test/ids/test_MAC-AO6517-1396019241689-631e0040 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry 2014-03-28 10:08:39,086 [main] INFO kafka.utils.ZkUtils$ - conflict in /consumers/test/ids/test_MAC-AO6517-1396019241689-631e0040 data: {"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019252998"} stored data: {"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019241812"} {code} We're using version 0.8.1. > createMessageStreams() in javaapi.ZookeeperConsumerConnector does not throw > --- > > Key: KAFKA-1179 > URL: https://issues.apache.org/jira/browse/KAFKA-1179 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.8.0 >Reporter: Vincent Rischmann >Assignee: Neha Narkhede > > In kafka.consumer.javaapi.ZookeeperConsumerConnector.scala, the > createMessageStreams() directly calls underlying.consume() (line 80) > In kafka.consumer.ZookeeperConsumerConnector.scala, the > createMessageStreams() throws an exception if it has been called more than > once (line 133). > The javaapi should throw if it is called more than once, just like the scala > api. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-1028: Fix Version/s: 0.8.2 > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen > Assignee: Andrew Olson > Fix For: 0.8.2 > > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability
[ https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951323#comment-13951323 ] Andrew Olson commented on KAFKA-1028: - Thanks for the info. Hopefully I'll have some time in the next week or two to tackle this, should be a fun project. > per topic configuration of preference for consistency over availability > --- > > Key: KAFKA-1028 > URL: https://issues.apache.org/jira/browse/KAFKA-1028 > Project: Kafka > Issue Type: Improvement >Reporter: Scott Clasen >Assignee: Andrew Olson > Fix For: 0.8.2 > > Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, > KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch > > > As discussed with Neha on the ML. > It should be possible to configure a topic to disallow unclean leader > election, thus preventing the situation where committed messages can be > discarded once a failed leader comes back online in a situation where it was > the only ISR. > This would open kafka to additional usecases where the possibility of > committted messages being discarded is unacceptable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970024#comment-13970024 ] Andrew Olson commented on KAFKA-493: Due to some concerns about partition count scalability, I setup a test that created a large number of idle topics (4 partitions per topic, replication factor = 3), periodically pausing topic creation to inspect the overall health of the Kafka cluster. I noticed that the CPU usage increased proportionally to the total number of topics/partitions. - With 50 topics and 200 partitions per broker, CPU consumption was averaging about 5% of a core on each system. - With 500 topics and 2000 partitions per broker, CPU consumption was averaging about 50% of a core, occasionally spiking up to 70%. - With 1000 topics and 4000 partitions per broker, CPU consumption started to approach 100% of a core. VisualVM showed that most of the time was being spent in epollWait. The Kafka cluster for this test is using Java version 1.7.0_25 on Redhat 6.2. > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.8.2 > > Attachments: stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main
[jira] [Updated] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-493: --- Attachment: Kafka-trace1.zip Kafka-trace2.zip Kafka-trace3.zip > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.8.2 > > Attachments: Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, > stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on > broker 1, ", group="main") > THREAD START (obj=53ae, id = 200028, name="SIGINT handler", > group="system") > THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main") > THREAD START (obj=574b, id = 200030, name="Thread-1", group="main") > THREAD START (obj=574b, id = 200031, name="Thread-0", group="main") > THREAD END (id = 200031) > THREAD END (id = 200029) > THREAD END (id = 200020) > THREAD END (id = 200019) > THREAD END (id = 28) > THREAD END (id = 200021) > THREAD END (id = 27) >
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972963#comment-13972963 ] Andrew Olson commented on KAFKA-493: Repeated the previous test using YourKit for profiling, trace snapshot files are attached. Will post a summary of my findings later this morning. The epollWait does not appear to be a problem. > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.8.2 > > Attachments: Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, > stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on > broker 1, ", group="main") > THREAD START (obj=53ae, id = 200028, name="SIGINT handler", > group="system") > THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main") > THREAD START (obj=574b, id = 200030, name="Thread-1", group="main") > THREAD START (obj=574b, id = 200031, name="Thread-0", group="main") > THREAD END (id = 200031) > THREA
[jira] [Updated] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Olson updated KAFKA-493: --- Attachment: Kafka-sampling1.zip Kafka-sampling2.zip Kafka-sampling3.zip Also attaching sampling-based profiling data, since the tracing-based profiling data may be a bit skewed due to the trace overhead. > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.8.2 > > Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, > Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, > stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on > broker 1, ", group="main") > THREAD START (obj=53ae, id = 200028, name="SIGINT handler", > group="system") > THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main") > THREAD START (obj=574b, id = 200030, name="Thread-1", group="main") > THREAD START (obj=574b, id = 200031, name="Thread-0", group="ma
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973114#comment-13973114 ] Andrew Olson commented on KAFKA-493: Looks like a large amount of time is spent in the ensureTopicExists method as it iterates over the metadata cache map until it finds a matching topic (this is especially problematic (O(n²)) when processing messages that reference a large number of partitions, like the replicas fetch requests that are continually sent). Fortunately this appears to have already been fixed in trunk/0.8.1.1 by KAFKA-1356 by changing the metadata cache from a Map [TopicAndPartition,PartitionStateInfo] to a Map[Topic, Map[Partition, PartitionStateInfo]]. > High CPU usage on inactive server > - > > Key: KAFKA-493 > URL: https://issues.apache.org/jira/browse/KAFKA-493 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.0 >Reporter: Jay Kreps > Fix For: 0.8.2 > > Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, > Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, > stacktrace.txt > > > > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU > > usage is fairly high (13% of a > > core). Is that to be expected? I did look at the stack, but didn't see > > anything obvious. A background > > task? > > I wanted to mention how I am getting into this state. I've set up two > > machines with the latest 0.8 > > code base and am using a replication factor of 2. On starting the brokers > > there is no idle CPU activity. > > Then I run a test that essential does 10k publish operations followed by > > immediate consume operations > > (I was measuring latency). Once this has run the kafka nodes seem to > > consistently be consuming CPU > > essentially forever. > hprof results: > THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", > group="system") > THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", > group="system") > THREAD START (obj=53ae, id = 21, name="main", group="main") > THREAD START (obj=53ae, id = 27, name="Thread-2", group="main") > THREAD START (obj=53ae, id = 28, name="Thread-3", group="main") > THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", > group="main") > THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", > group="main") > THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main") > THREAD START (obj=574b, id = 200012, > name="ZkClient-EventThread-20-localhost:2181", group="main") > THREAD START (obj=576e, id = 200014, name="main-SendThread()", > group="main") > THREAD START (obj=576d, id = 200013, name="main-EventThread", > group="main") > THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", > group="main") > THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", > group="main") > THREAD START (obj=53ae, id = 200017, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200018, name="request-expiration-task", > group="main") > THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", > group="main") > THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", > group="main") > THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main") > THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main") > THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on > broker 1, ", group="main") > THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on > broker 1, ", group="main") > THREAD START (obj=53ae, id =