Review Request 37480: Patch for KAFKA-2434

2015-08-14 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37480/
---

Review request for kafka.


Bugs: KAFKA-2434
https://issues.apache.org/jira/browse/KAFKA-2434


Repository: kafka


Description
---

Remove identical topic subscription constraint for roundrobin strategy in old 
consumer API


Diffs
-

  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
849284ad2cfa00ed36a36548556813df639ceae9 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
e42d10488f8dfded0f34ba5da2da74a3cb2a5947 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
adf08010597b7c6ed72eddf93962497c3209e10f 

Diff: https://reviews.apache.org/r/37480/diff/


Testing
---


Thanks,

Andrew Olson



Review Request 37481: Patch for KAFKA-2435

2015-08-14 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37481/
---

Review request for kafka.


Bugs: KAFKA-2435
https://issues.apache.org/jira/browse/KAFKA-2435


Repository: kafka


Description
---

Fair assignment strategy


Diffs
-

  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
d35b421a515074d964c7fccb73d260b847ea5f00 
  core/src/main/scala/kafka/consumer/ConsumerConfig.scala 
97a56ce7f2acbf9b5c9b8360268b72722791a96a 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
849284ad2cfa00ed36a36548556813df639ceae9 
  core/src/main/scala/kafka/coordinator/PartitionAssignor.scala 
8499bf86f1de21091633d047e95b260003132ac8 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
adf08010597b7c6ed72eddf93962497c3209e10f 
  core/src/test/scala/unit/kafka/coordinator/PartitionAssignorTest.scala 
887cee5a582b5737ba838920399bb9b24bf22382 

Diff: https://reviews.apache.org/r/37481/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-03 Thread Andrew Olson


> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/admin/AdminUtils.scala, line 356
> > <https://reviews.apache.org/r/17537/diff/3/?file=456702#file456702line356>
> >
> > "unclean leader" is a bit confusing, how about 
> > uncleanLeaderElectionEnabled?

Sounds good to me. Just to confirm, you are suggesting that it is preferable to 
name this simply "uncleanLeaderElectionEnabled" instead of something more 
verbose like "isUncleanLeaderElectionEnabledForTopic" or the current name?


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review35999
-------


On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated Jan. 30, 2014, 7:45 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> a167756f0fd358574c8ccb42c5c96aaf13def4f5 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fd9200f3bf941aab54df798bb5899eeb552ea3a3 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 426b1a7bea1d83a64081f2c6b672c88c928713b7 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-03 Thread Andrew Olson


> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64
> > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64>
> >
> > Not sure throwing an exception is the right behavior here? If unclean 
> > election is disabled, which means we choose consistency over availability, 
> > when ISR is empty we should keep the partition as offline from the clients 
> > until it is not empty. Throwing an exception will just halt the whole 
> > partition state change process, but may not necessarily resume when the ISR 
> > has changed. Could you confirm in your integration tests if this is the 
> > case?

The exception-handling and partition state changes are verified in the 
integration test, see lines 226-249 of UncleanLeaderElectionTest.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review35999
-------


On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated Jan. 30, 2014, 7:45 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> a167756f0fd358574c8ccb42c5c96aaf13def4f5 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fd9200f3bf941aab54df798bb5899eeb552ea3a3 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 426b1a7bea1d83a64081f2c6b672c88c928713b7 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-03 Thread Andrew Olson


> On March 3, 2014, 7:03 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala, line 86
> > <https://reviews.apache.org/r/17537/diff/3/?file=456708#file456708line86>
> >
> > I am not sure we need to check the unclean election flag here. If we 
> > can keep the partition as offline for clients in the controller phase, will 
> > that be sufficient since no data will be appended/consumed from these 
> > replica logs until that is unblocked?
> > 
> > As for reading the configs, I agree that if we consider per-topic 
> > configs then we probably need to read it from ZK.

Yes, under normal circumstances the partition would be kept offline until the 
previous leader is available, which would be sufficient to prevent log 
truncation from ever being considered. This code was added as an extra 
safeguard to handle the remote possibility of the config being changed while a 
replica is offline, which I believe is the only way this code path could be 
reached. I tried to describe this in the comment below: "This situation could 
only happen if the unclean election configuration for a topic changes while a 
replica is down. Otherwise, we should never encounter this situation since a 
non-ISR leader cannot be elected if disallowed by the broker configuration."

In the follow-up patch I'll move this comment above the AdminUtils check so 
that it's more visible.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review35999
-------


On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated Jan. 30, 2014, 7:45 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> a167756f0fd358574c8ccb42c5c96aaf13def4f5 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fd9200f3bf941aab54df798bb5899eeb552ea3a3 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 426b1a7bea1d83a64081f2c6b672c88c928713b7 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-03 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/
---

(Updated March 4, 2014, 12:48 a.m.)


Review request for kafka.


Bugs: KAFKA-1028
https://issues.apache.org/jira/browse/KAFKA-1028


Repository: kafka


Description
---

KAFKA-1028: per topic configuration of preference for consistency over 
availability


Diffs (updated)
-

  core/src/main/scala/kafka/admin/AdminUtils.scala 
36ddeb44490e8343a4e8056c45726ac660e4b2f9 
  core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
a1e12794978adf79020936c71259bbdabca8ee68 
  core/src/main/scala/kafka/controller/KafkaController.scala 
b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
fa29bbe82db35551f43ac487912fba7bae1b2599 
  core/src/main/scala/kafka/log/LogConfig.scala 
0b32aeeffcd9d4755ac90573448d197d3f729749 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
d5644ea40ec7678b975c4775546b79fcfa9f64b7 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
89c207a3f56c7a7711f8cee6fb277626329882a6 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
772d2140ed926a2f9f0c99aea60daf1a9b987073 

Diff: https://reviews.apache.org/r/17537/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-03 Thread Andrew Olson


> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote:
> > core/src/main/scala/kafka/server/ReplicaFetcherThread.scala, line 86
> > <https://reviews.apache.org/r/17537/diff/3/?file=456708#file456708line86>
> >
> > Reading from zookeeper is unnecessary here, since the broker has a 
> > mechanism to load per topic configs on the fly. Just accessing it through 
> > the config object should suffice
> 
> Neha Narkhede wrote:
> Here you could just do 
> 
> if(brokerConfig.uncleanLeaderElectionEnable) instead of if 
> (!AdminUtils.canElectUncleanLeaderForTopic(replicaMgr.zkClient, 
> topicAndPartition.topic, brokerConfig.uncleanLeaderElectionEnable))
> 
> To understand how the broker dynamically loads per topic configs, you can 
> look at TopicConfigManager. It registers a zookeeper listener on the topic 
> config change path and atomically switches the log config object to reflect 
> the new per topic config overrides.

See justification in PartitionLeaderSelector comment thread above.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review33658
-------


On March 4, 2014, 12:48 a.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated March 4, 2014, 12:48 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> 36ddeb44490e8343a4e8056c45726ac660e4b2f9 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fa29bbe82db35551f43ac487912fba7bae1b2599 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
> d5644ea40ec7678b975c4775546b79fcfa9f64b7 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 772d2140ed926a2f9f0c99aea60daf1a9b987073 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-14 Thread Andrew Olson


> On March 14, 2014, 8:13 p.m., Jay Kreps wrote:
> > This is great!
> > 
> > I don't think we should add a special purpose method that hard codes one 
> > property in AdminUtils, I think the helper code is actually there in 
> > LogConfig.
> > 
> > I see Neha's point about adding a higher-level config to log config but I 
> > think it is worth it and not so bad as it is just a config--there is no 
> > major code coupling problem.
> > 
> > The only real question I have which is really for Neha is whether adding 
> > these new ZK queries is a performance concern for leadership election and 
> > transfer? Should instead the controller or whoever just watch and 
> > constantly replicate these log configs all the time so that when election 
> > time comes no new fetches are required. This could perhaps be done as a 
> > follow-up improvement...
> > 
> >

Thanks for the feedback. The LogConfig.fromProps(...) approach works, but it 
looks like I'd need to update the implementation to provide default values for 
each of the props.getProperty(...) calls (it currently assumes that all 
supported properties are included in the provided Properties). Should I go 
ahead and make that change?


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review37265
---


On March 4, 2014, 12:48 a.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated March 4, 2014, 12:48 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> 36ddeb44490e8343a4e8056c45726ac660e4b2f9 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> b58cdcd16ffb62ba5329b8b2776f2bd18440b3a0 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fa29bbe82db35551f43ac487912fba7bae1b2599 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 04a5d39be3c41f5cb3589d95d0bd0f4fb4d7030d 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
> d5644ea40ec7678b975c4775546b79fcfa9f64b7 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 772d2140ed926a2f9f0c99aea60daf1a9b987073 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-17 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/
---

(Updated March 17, 2014, 2:39 p.m.)


Review request for kafka.


Bugs: KAFKA-1028
https://issues.apache.org/jira/browse/KAFKA-1028


Repository: kafka


Description
---

KAFKA-1028: per topic configuration of preference for consistency over 
availability


Diffs (updated)
-

  core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
a1e12794978adf79020936c71259bbdabca8ee68 
  core/src/main/scala/kafka/controller/KafkaController.scala 
5db24a73a62c725a76c79936abb15b1fda3b770b 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
fa29bbe82db35551f43ac487912fba7bae1b2599 
  core/src/main/scala/kafka/log/LogConfig.scala 
18c86fed5efc765f9b059775988cc83ef0ef3c3b 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
d07796ec87fa9bf0d13e1690240b85979aba3224 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
89c207a3f56c7a7711f8cee6fb277626329882a6 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
772d2140ed926a2f9f0c99aea60daf1a9b987073 

Diff: https://reviews.apache.org/r/17537/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-17 Thread Andrew Olson


> On March 14, 2014, 8:13 p.m., Jay Kreps wrote:
> > This is great!
> > 
> > I don't think we should add a special purpose method that hard codes one 
> > property in AdminUtils, I think the helper code is actually there in 
> > LogConfig.
> > 
> > I see Neha's point about adding a higher-level config to log config but I 
> > think it is worth it and not so bad as it is just a config--there is no 
> > major code coupling problem.
> > 
> > The only real question I have which is really for Neha is whether adding 
> > these new ZK queries is a performance concern for leadership election and 
> > transfer? Should instead the controller or whoever just watch and 
> > constantly replicate these log configs all the time so that when election 
> > time comes no new fetches are required. This could perhaps be done as a 
> > follow-up improvement...
> > 
> >
> 
> Andrew Olson wrote:
> Thanks for the feedback. The LogConfig.fromProps(...) approach works, but 
> it looks like I'd need to update the implementation to provide default values 
> for each of the props.getProperty(...) calls (it currently assumes that all 
> supported properties are included in the provided Properties). Should I go 
> ahead and make that change?
> 
> Neha Narkhede wrote:
> I do think that the zookeeper read is a performance concern during 
> leadership changes, since the patch currently does one read per partition. I 
> think it is not so much a problem with the patch but more with us improving 
> our handling of per topic non-log configs. As you suggested, we should add 
> the ability to refresh per topic configs on the controller and possibly the 
> ReplicaManager as well. But I think we can do that as a follow up improvement 
> and accept this patch.
> 
> Neha Narkhede wrote:
> >> Thanks for the feedback. The LogConfig.fromProps(...) approach works, 
> but it looks like I'd need to update the implementation to provide default 
> values for each of the props.getProperty(...) calls (it currently assumes 
> that all supported properties are included in the provided Properties). 
> Should I go ahead and make that change?
> 
> Yes please. Would you mind updating your patch with that change and also 
> rebasing it? I think we can check it in after that.

Patch has been rebased and updated.


- Andrew


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review37265
---


On March 17, 2014, 2:39 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated March 17, 2014, 2:39 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> 5db24a73a62c725a76c79936abb15b1fda3b770b 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fa29bbe82db35551f43ac487912fba7bae1b2599 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 18c86fed5efc765f9b059775988cc83ef0ef3c3b 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> d07796ec87fa9bf0d13e1690240b85979aba3224 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 772d2140ed926a2f9f0c99aea60daf1a9b987073 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-03-17 Thread Andrew Olson


> On March 17, 2014, 4:30 p.m., Jay Kreps wrote:
> > core/src/main/scala/kafka/server/KafkaConfig.scala, line 256
> > <https://reviews.apache.org/r/17537/diff/4-5/?file=509273#file509273line256>
> >
> > Are these config changes due to needing a rebase or something? They 
> > don't seem related to the patch.

Yes, the rebase this morning picked up KAFKA-1012 [1]. Should a rebase and 
patch code update be uploaded to the review board separately?

[1] 
https://github.com/apache/kafka/commit/a670537aa33732b15b56644d8ccc1681e16395f5


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review37386
---


On March 17, 2014, 2:39 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated March 17, 2014, 2:39 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> 5db24a73a62c725a76c79936abb15b1fda3b770b 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fa29bbe82db35551f43ac487912fba7bae1b2599 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 18c86fed5efc765f9b059775988cc83ef0ef3c3b 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> d07796ec87fa9bf0d13e1690240b85979aba3224 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 772d2140ed926a2f9f0c99aea60daf1a9b987073 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Review Request 17537: Patch for KAFKA-1028

2014-01-30 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/
---

Review request for kafka.


Bugs: KAFKA-1028
https://issues.apache.org/jira/browse/KAFKA-1028


Repository: kafka


Description
---

KAFKA-1028: per topic configuration of preference for consistency over 
availability


Diffs
-

  core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
a1e12794978adf79020936c71259bbdabca8ee68 
  core/src/main/scala/kafka/controller/KafkaController.scala 
a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
fd9200f3bf941aab54df798bb5899eeb552ea3a3 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
89c207a3f56c7a7711f8cee6fb277626329882a6 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
d88b6c3e8fd8098d540998b6a82a65cec8a1dcb0 

Diff: https://reviews.apache.org/r/17537/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-01-30 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/
---

(Updated Jan. 30, 2014, 7:42 p.m.)


Review request for kafka.


Bugs: KAFKA-1028
https://issues.apache.org/jira/browse/KAFKA-1028


Repository: kafka


Description
---

KAFKA-1028: per topic configuration of preference for consistency over 
availability


Diffs (updated)
-

  core/src/main/scala/kafka/admin/AdminUtils.scala 
a167756f0fd358574c8ccb42c5c96aaf13def4f5 
  core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
a1e12794978adf79020936c71259bbdabca8ee68 
  core/src/main/scala/kafka/controller/KafkaController.scala 
a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
fd9200f3bf941aab54df798bb5899eeb552ea3a3 
  core/src/main/scala/kafka/log/LogConfig.scala 
0b32aeeffcd9d4755ac90573448d197d3f729749 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
89c207a3f56c7a7711f8cee6fb277626329882a6 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
d88b6c3e8fd8098d540998b6a82a65cec8a1dcb0 

Diff: https://reviews.apache.org/r/17537/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-01-30 Thread Andrew Olson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/
---

(Updated Jan. 30, 2014, 7:45 p.m.)


Review request for kafka.


Bugs: KAFKA-1028
https://issues.apache.org/jira/browse/KAFKA-1028


Repository: kafka


Description
---

KAFKA-1028: per topic configuration of preference for consistency over 
availability


Diffs (updated)
-

  core/src/main/scala/kafka/admin/AdminUtils.scala 
a167756f0fd358574c8ccb42c5c96aaf13def4f5 
  core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
a1e12794978adf79020936c71259bbdabca8ee68 
  core/src/main/scala/kafka/controller/KafkaController.scala 
a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
fd9200f3bf941aab54df798bb5899eeb552ea3a3 
  core/src/main/scala/kafka/log/LogConfig.scala 
0b32aeeffcd9d4755ac90573448d197d3f729749 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
89c207a3f56c7a7711f8cee6fb277626329882a6 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
426b1a7bea1d83a64081f2c6b672c88c928713b7 

Diff: https://reviews.apache.org/r/17537/diff/


Testing
---


Thanks,

Andrew Olson



Re: Review Request 17537: Patch for KAFKA-1028

2014-02-05 Thread Andrew Olson


> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote:
> > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64
> > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64>
> >
> > the config object should already have the per topic preference for 
> > unclean leader election. So we don't have to read from zookeeper again.

It doesn't look like this is actually the case. The KafkaConfig is passed from 
the KafkaServer to the KafkaController with no topic context, and the 
controller does not appear to be integrated with the topic log configuration 
logic in the TopicConfigManager/LogManager.

Just to confirm my understanding of the code, I removed this Zookeeper read and 
doing so caused the two TopicOverride integration tests that I added to fail. 
Is there is a simpler or less awkward way to implement this as per topic 
configuration? Reading the config on demand from ZK seems like the simplest and 
least invasive option since this should not be a frequently executed code path, 
but I could be missing something obvious.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review33658
-------


On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated Jan. 30, 2014, 7:45 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> a167756f0fd358574c8ccb42c5c96aaf13def4f5 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> a0267ae2670e8d5f365e49ec0fa5db1f62b815bf 
>   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
> fd9200f3bf941aab54df798bb5899eeb552ea3a3 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> 0b32aeeffcd9d4755ac90573448d197d3f729749 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 3c3aafc2b3f06fc8f3168a8a9c1e0b08e944c1ef 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> 73e605eb31bc71642d48b0bb8bd1632fd70b9dca 
>   core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
> b585f0ec0b1c402d95a3b34934dab7545dcfcb1f 
>   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 89c207a3f56c7a7711f8cee6fb277626329882a6 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 426b1a7bea1d83a64081f2c6b672c88c928713b7 
> 
> Diff: https://reviews.apache.org/r/17537/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Olson
> 
>



Re: Review Request 17537: Patch for KAFKA-1028

2014-02-07 Thread Andrew Olson


> On Feb. 4, 2014, 11:21 p.m., Neha Narkhede wrote:
> > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, line 64
> > <https://reviews.apache.org/r/17537/diff/3/?file=456705#file456705line64>
> >
> > the config object should already have the per topic preference for 
> > unclean leader election. So we don't have to read from zookeeper again.
> 
> Andrew Olson wrote:
> It doesn't look like this is actually the case. The KafkaConfig is passed 
> from the KafkaServer to the KafkaController with no topic context, and the 
> controller does not appear to be integrated with the topic log configuration 
> logic in the TopicConfigManager/LogManager.
> 
> Just to confirm my understanding of the code, I removed this Zookeeper 
> read and doing so caused the two TopicOverride integration tests that I added 
> to fail. Is there is a simpler or less awkward way to implement this as per 
> topic configuration? Reading the config on demand from ZK seems like the 
> simplest and least invasive option since this should not be a frequently 
> executed code path, but I could be missing something obvious.
> 
> Neha Narkhede wrote:
> >> It doesn't look like this is actually the case. The KafkaConfig is 
> passed from the KafkaServer to the KafkaController with no topic context, and 
> the controller does not appear to be integrated with the topic log 
> configuration logic in the TopicConfigManager/LogManager.
> 
> To understand how the broker dynamically loads per topic configs, you can 
> look at TopicConfigManager. It registers a zookeeper listener on the topic 
> config change path and atomically switches the log config object to reflect 
> the new per topic config overrides.
> 
> I haven't looked at the tests in detail, but are you introducing the per 
> topic config overrides the same way as TopicCommand does (by writing to the 
> correct zk path)? That is the only way it will be automatically reloaded by 
> all the brokers, including the controller

>> are you introducing the per topic config overrides the same way as 
>> TopicCommand does (by writing to the correct zk path)?
Yes, I'm using AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(...) 
in the tests.

>> To understand how the broker dynamically loads per topic configs, you can 
>> look at TopicConfigManager.
After reading through the code some more, I think I understand how the 
TopicConfigManager works. It is currently only integrated with the LogManager 
[1]. The LogManager functionality is isolated from the KafkaController and 
ReplicaFetcherThread, which only have access to the base server KafkaConfig. 
The KafkaConfig config is initialized when starting the broker and immutable. 
The dynamic configuration updates you're referring to are done in the 
LogManager's map of LogConfig instances. I didn't want to introduce an 
abstraction leak by passing the LogManager instance to the controller and 
replica fetcher threads and making this new replica configuration part of the 
LogConfig. I also wasn't sure whether it was worth the effort and extra 
complexity to enhance the TopicConfigManager to also automatically reload 
replica-related configuration in addition to log-related configuration (i.e., 
adding new ReplicaConfig and ReplicaManager classes similar to LogConfig and 
LogManager), since there 
 is currently only a single configuration property that is not very frequently 
checked.

[1] 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaServer.scala#L98


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17537/#review33658
---


On Jan. 30, 2014, 7:45 p.m., Andrew Olson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17537/
> ---
> 
> (Updated Jan. 30, 2014, 7:45 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1028
> https://issues.apache.org/jira/browse/KAFKA-1028
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1028: per topic configuration of preference for consistency over 
> availability
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> a167756f0fd358574c8ccb42c5c96aaf13def4f5 
>   core/src/main/scala/kafka/common/NoReplicaOnlineException.scala 
> a1e12794978adf79020936c71259bbdabca8ee68 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> a

[jira] [Created] (KAFKA-6334) Minor documentation typo

2017-12-08 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-6334:
---

 Summary: Minor documentation typo
 Key: KAFKA-6334
 URL: https://issues.apache.org/jira/browse/KAFKA-6334
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.0
Reporter: Andrew Olson
Priority: Trivial


At [1]:

{quote}
0.11.0 consumers support backwards compatibility with brokers 0.10.0 brokers 
and upward, so it is possible to upgrade the clients first before the brokers
{quote}

Specifically the "brokers 0.10.0 brokers" wording.

[1] http://kafka.apache.org/documentation.html#upgrade_11_message_format



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1379) Partition reassignment resets clock for time-based retention

2017-08-11 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson resolved KAFKA-1379.
-
   Resolution: Fixed
 Assignee: Jiangjie Qin
Fix Version/s: 0.10.1.0

Marking this bug as resolved in 0.10.1.0 based on the statement "The log 
retention time is no longer based on last modified time of the log segments. 
Instead it will be based on the largest timestamp of the messages in a log 
segment." in http://kafka.apache.org/documentation.html#upgrade_10_1_breaking

> Partition reassignment resets clock for time-based retention
> 
>
> Key: KAFKA-1379
> URL: https://issues.apache.org/jira/browse/KAFKA-1379
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Joel Koshy
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> Since retention is driven off mod-times reassigned partitions will result in
> data that has been on a leader to be retained for another full retention
> cycle. E.g., if retention is seven days and you reassign partitions on the
> sixth day then those partitions will remain on the replicas for another
> seven days.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

2016-07-18 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383107#comment-15383107
 ] 

Andrew Olson commented on KAFKA-3924:
-

Reviewed, looks good to me.

> Data loss due to halting when LEO is larger than leader's LEO
> -
>
> Key: KAFKA-3924
> URL: https://issues.apache.org/jira/browse/KAFKA-3924
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> Currently the follower broker panics when its LEO is larger than its leader's 
> LEO,  and assuming that this is an impossible state to reach, halts the 
> process to prevent any further damage.
> {code}
> if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>   // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>   // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>   // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>   if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
> ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
> // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
> fatal("...")
> Runtime.getRuntime.halt(1)
>   }
> {code}
> Firstly this assumption is invalid and there are legitimate cases (examples 
> below) that this case could actually occur. Secondly halt results into the 
> broker losing its un-flushed data, and if multiple brokers halt 
> simultaneously there is a chance that both leader and followers of a 
> partition are among the halted brokers, which would result into permanent 
> data loss.
> Given that this is a legit case, we suggest to replace it with a graceful 
> shutdown to avoid propagating data loss to the entire cluster.
> Details:
> One legit case that this could actually occur is when a troubled broker 
> shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In 
> this case the broker has lost some data but the controller cannot still 
> elects the others as the leader. If the crashed broker comes back up, the 
> controller elects it as the leader, and as a result all other brokers who are 
> now following it halt since they have LEOs larger than that of shrunk topics 
> in the restarted broker.  We actually had a case that bringing up a crashed 
> broker simultaneously took down the entire cluster and as explained above 
> this could result into data loss.
> The other legit case is when multiple brokers ungracefully shutdown at the 
> same time. In this case both of the leader and the followers lose their 
> un-flushed data but one of them has HW larger than the other. Controller 
> elects the one who comes back up sooner as the leader and if its LEO is less 
> than its future follower, the follower will halt (and probably lose more 
> data). Simultaneous ungrateful shutdown could happen due to hardware issue 
> (e.g., rack power failure), operator errors, or software issue (e.g., the 
> case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes 
> simultaneous halts in multiple brokers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect

2016-08-10 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3971:

Summary: Consumers drop from coordinator and cannot reconnect  (was: 
Consumers drop from coordinator and cannot reconnet)

> Consumers drop from coordinator and cannot reconnect
> 
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect

2016-08-10 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415806#comment-15415806
 ] 

Andrew Olson commented on KAFKA-3971:
-

How many partitions per topic do you have?

> Consumers drop from coordinator and cannot reconnect
> 
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnect

2016-08-10 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415838#comment-15415838
 ] 

Andrew Olson commented on KAFKA-3971:
-

Even at 1 partition, if a single process is consuming from that many topics its 
thread count would be very high - a separate fetcher thread started for each 
partition being consumed. So you may be at risk of running into a "max user 
processes" sort of ulimit or out of memory, especially if there is a 
substantial amount of data in each topic when the consumer is started, as each 
thread can read up to fetch.message.max.bytes worth of data.

> Consumers drop from coordinator and cannot reconnect
> 
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-12-05 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722341#comment-15722341
 ] 

Andrew Olson commented on KAFKA-3228:
-

Since the exception message that we saw has been removed from the code, I agree 
that this should be marked resolved. The bug description for KAFKA-4214 makes 
sense for applicability to this scenario as well.

> Partition reassignment failure for brokers freshly added to cluster
> ---
>
> Key: KAFKA-3228
> URL: https://issues.apache.org/jira/browse/KAFKA-3228
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
>Reporter: Andrew Olson
>Assignee: Neha Narkhede
> Fix For: 0.10.1.0
>
>
> After adding about new 20 brokers to double the size of an existing 
> production Kafka deployment, when attempting to rebalance partitions we were 
> initially unable to reassign any partitions to 5 of the 20. There was no 
> problem with the other 15. The controller broker logged error messages like:
> {noformat}
> ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
> reassignment of partition [TOPIC-NAME,2]
> kafka.common.KafkaException: Only 4,33 replicas out of the new set of 
> replicas 4,34,33 for partition [TOPIC-NAME,2]
> to be reassigned are alive. Failing partition reassignment
>   at 
> kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
>   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> We reattempted the reassignment to one of these new brokers, with the same 
> result.
> We also saw these messages in the controller's log. There was a "Broken pipe" 
> error for each of the new brokers.
> {noformat}
> 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
>   at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:148)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
>   at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
>   at 
> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
>   at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
>   at 
> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> {noformat}
> {noformat}
> WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
> broker id:34...
> Reconnecting to broker.
> java.io.EOFException: Received -1 when reading from channel, socket has 
> likely been closed.
>   at kafka.utils.Utils$.read(Utils.scala:381)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54

[jira] [Commented] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-12-05 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722343#comment-15722343
 ] 

Andrew Olson commented on KAFKA-3228:
-

Added fix version and duplicate link.

> Partition reassignment failure for brokers freshly added to cluster
> ---
>
> Key: KAFKA-3228
> URL: https://issues.apache.org/jira/browse/KAFKA-3228
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
>Reporter: Andrew Olson
>Assignee: Neha Narkhede
> Fix For: 0.10.1.0
>
>
> After adding about new 20 brokers to double the size of an existing 
> production Kafka deployment, when attempting to rebalance partitions we were 
> initially unable to reassign any partitions to 5 of the 20. There was no 
> problem with the other 15. The controller broker logged error messages like:
> {noformat}
> ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
> reassignment of partition [TOPIC-NAME,2]
> kafka.common.KafkaException: Only 4,33 replicas out of the new set of 
> replicas 4,34,33 for partition [TOPIC-NAME,2]
> to be reassigned are alive. Failing partition reassignment
>   at 
> kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
>   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> We reattempted the reassignment to one of these new brokers, with the same 
> result.
> We also saw these messages in the controller's log. There was a "Broken pipe" 
> error for each of the new brokers.
> {noformat}
> 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
>   at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:148)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
>   at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
>   at 
> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
>   at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
>   at 
> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> {noformat}
> {noformat}
> WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
> broker id:34...
> Reconnecting to broker.
> java.io.EOFException: Received -1 when reading from channel, socket has 
> likely been closed.
>   at kafka.utils.Utils$.read(Utils.scala:381)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)

[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-12-05 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3228:

Fix Version/s: 0.10.1.0

> Partition reassignment failure for brokers freshly added to cluster
> ---
>
> Key: KAFKA-3228
> URL: https://issues.apache.org/jira/browse/KAFKA-3228
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.1
>Reporter: Andrew Olson
>Assignee: Neha Narkhede
> Fix For: 0.10.1.0
>
>
> After adding about new 20 brokers to double the size of an existing 
> production Kafka deployment, when attempting to rebalance partitions we were 
> initially unable to reassign any partitions to 5 of the 20. There was no 
> problem with the other 15. The controller broker logged error messages like:
> {noformat}
> ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
> reassignment of partition [TOPIC-NAME,2]
> kafka.common.KafkaException: Only 4,33 replicas out of the new set of 
> replicas 4,34,33 for partition [TOPIC-NAME,2]
> to be reassigned are alive. Failing partition reassignment
>   at 
> kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
>   at 
> kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
>   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> We reattempted the reassignment to one of these new brokers, with the same 
> result.
> We also saw these messages in the controller's log. There was a "Broken pipe" 
> error for each of the new brokers.
> {noformat}
> 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
>   at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:148)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
>   at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
>   at 
> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
>   at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
>   at 
> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> {noformat}
> {noformat}
> WARN kafka.controller.RequestSendThread: 
> [Controller-19-to-broker-34-send-thread],
> Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
> broker id:34...
> Reconnecting to broker.
> java.io.EOFException: Received -1 when reading from channel, socket has 
> likely been closed.
>   at kafka.utils.Utils$.read(Utils.scala:381)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
&

[jira] [Commented] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes

2017-03-01 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890540#comment-15890540
 ] 

Andrew Olson commented on KAFKA-4092:
-

Note that this Jira is being reverted in 0.10.2.1 (see KAFKA-4788 for details).

> retention.bytes should not be allowed to be less than segment.bytes
> ---
>
> Key: KAFKA-4092
> URL: https://issues.apache.org/jira/browse/KAFKA-4092
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Right now retention.bytes can be as small as the user wants but it doesn't 
> really get acted on for the active segment if retention.bytes is smaller than 
> segment.bytes.  We shouldn't allow retention.bytes to be less than 
> segment.bytes and validate that at startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-4599:
---

 Summary: KafkaConsumer encounters SchemaException when Kafka 
broker stopped
 Key: KAFKA-4599
 URL: https://issues.apache.org/jira/browse/KAFKA-4599
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Andrew Olson


We recently observed an issue in production that can apparently occur a small 
percentage of the time when a Kafka broker is stopped. We're using version 
0.9.0.1 for all brokers and clients.

During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
ran into the following SchemaException within a few seconds of instructing the 
broker to shutdown.

{noformat}
2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
Error reading field 'responses': Error reading array of size 2774863, only 62 
bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
{noformat}

The exception message was slightly different for one consumer,
{{Error reading field 'responses': Error reading array of size 2774863, only 
260 bytes available}}

The exception was not caught and caused the Storm Executor thread to restart, 
so it's not clear if it would have been transient or fatal for the 
KafkaConsumer.

Here are the initial broker shutdown logs,

{noformat}
2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
shutting down
2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Shutting down
2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Stopped 
2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-1-40], Shutdown completed
2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-3-30], Shutting down
2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
Controlled shutdown succeeded
2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
Broker 4], Shutting down
2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
Broker 4], Shutdown completed
{noformat}

We've found one very similar reported occurrence,
http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802307#comment-15802307
 ] 

Andrew Olson commented on KAFKA-4599:
-

Another recent similar reported occurrence from the mailing list,
http://mail-archives.apache.org/mod_mbox/kafka-users/201612.mbox/%3CBY2PR02MB282F72F26267CAE415F2BA6D69B0%40BY2PR02MB282.namprd02.prod.outlook.com%3E

Could the consumer code possibly be missing logic for handling a rare broker 
response? (For example like the issue fixed by 
https://github.com/apache/kafka/pull/2070)

> KafkaConsumer encounters SchemaException when Kafka broker stopped
> --
>
> Key: KAFKA-4599
> URL: https://issues.apache.org/jira/browse/KAFKA-4599
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>    Reporter: Andrew Olson
>
> We recently observed an issue in production that can apparently occur a small 
> percentage of the time when a Kafka broker is stopped. We're using version 
> 0.9.0.1 for all brokers and clients.
> During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
> ran into the following SchemaException within a few seconds of instructing 
> the broker to shutdown.
> {noformat}
> 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
> Error reading field 'responses': Error reading array of size 2774863, only 62 
> bytes available
>   at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {noformat}
> The exception message was slightly different for one consumer,
> {{Error reading field 'responses': Error reading array of size 2774863, only 
> 260 bytes available}}
> The exception was not caught and caused the Storm Executor thread to restart, 
> so it's not clear if it would have been transient or fatal for the 
> KafkaConsumer.
> Here are the initial broker shutdown logs,
> {noformat}
> 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> shutting down
> 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutting down
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Stopped 
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutdown completed
> 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-3-30], Shutting down
> 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> Controlled shutdown succeeded
> 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutting down
> 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutdown completed
> {noformat}
> We've found one very similar reported occurrence,
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-06 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804819#comment-15804819
 ] 

Andrew Olson commented on KAFKA-4599:
-

We are not using either of those. We have implemented our own Storm spout that 
manages a KafkaConsumer and committing offsets. We're on version 0.10.0 for 
Storm and 0.9.0.1 for the Kafka client.

> KafkaConsumer encounters SchemaException when Kafka broker stopped
> --
>
> Key: KAFKA-4599
> URL: https://issues.apache.org/jira/browse/KAFKA-4599
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Andrew Olson
>
> We recently observed an issue in production that can apparently occur a small 
> percentage of the time when a Kafka broker is stopped. We're using version 
> 0.9.0.1 for all brokers and clients.
> During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
> ran into the following SchemaException within a few seconds of instructing 
> the broker to shutdown.
> {noformat}
> 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
> Error reading field 'responses': Error reading array of size 2774863, only 62 
> bytes available
>   at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {noformat}
> The exception message was slightly different for one consumer,
> {{Error reading field 'responses': Error reading array of size 2774863, only 
> 260 bytes available}}
> The exception was not caught and caused the Storm Executor thread to restart, 
> so it's not clear if it would have been transient or fatal for the 
> KafkaConsumer.
> Here are the initial broker shutdown logs,
> {noformat}
> 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> shutting down
> 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutting down
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Stopped 
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutdown completed
> 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-3-30], Shutting down
> 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> Controlled shutdown succeeded
> 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutting down
> 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutdown completed
> {noformat}
> We've found one very similar reported occurrence,
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1379) Partition reassignment resets clock for time-based retention

2017-01-20 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832155#comment-15832155
 ] 

Andrew Olson commented on KAFKA-1379:
-

[~jjkoshy] / [~becket_qin] should this Jira now be closed as a duplicate of 
KAFKA-3163?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-Enforcetimebasedlogretention

> Partition reassignment resets clock for time-based retention
> 
>
> Key: KAFKA-1379
> URL: https://issues.apache.org/jira/browse/KAFKA-1379
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>
> Since retention is driven off mod-times reassigned partitions will result in
> data that has been on a leader to be retained for another full retention
> cycle. E.g., if retention is seven days and you reassign partitions on the
> sixth day then those partitions will remain on the replicas for another
> seven days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1379) Partition reassignment resets clock for time-based retention

2017-01-23 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-1379:

Component/s: log

> Partition reassignment resets clock for time-based retention
> 
>
> Key: KAFKA-1379
> URL: https://issues.apache.org/jira/browse/KAFKA-1379
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Joel Koshy
>
> Since retention is driven off mod-times reassigned partitions will result in
> data that has been on a leader to be retained for another full retention
> cycle. E.g., if retention is seven days and you reassign partitions on the
> sixth day then those partitions will remain on the replicas for another
> seven days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2172) Round-robin partition assignment strategy too restrictive

2017-02-01 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson resolved KAFKA-2172.
-
   Resolution: Fixed
 Assignee: Andrew Olson
Fix Version/s: 0.10.2.0

Fixed for new consumer in 0.9.0.0 and old consumer in 0.10.2.0, marking as 
resolved and setting fix version.

> Round-robin partition assignment strategy too restrictive
> -
>
> Key: KAFKA-2172
> URL: https://issues.apache.org/jira/browse/KAFKA-2172
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Rosenberg
>    Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
>
> The round-ropin partition assignment strategy, was introduced for the 
> high-level consumer, starting with 0.8.2.1.  This appears to be a very 
> attractive feature, but it has an unfortunate restriction, which prevents it 
> from being easily utilized.  That is that it requires all consumers in the 
> consumer group have identical topic regex selectors, and that they have the 
> same number of consumer threads.
> It turns out this is not always the case for our deployments.  It's not 
> unusual to run multiple consumers within a single process (with different 
> topic selectors), or we might have multiple processes dedicated for different 
> topic subsets.  Agreed, we could change these to have separate group ids for 
> each sub topic selector (but unfortunately, that's easier said than done).  
> In several cases, we do at least have separate client.ids set for each 
> sub-consumer, so it would be incrementally better if we could at least loosen 
> the requirement such that each set of topics selected by a groupid/clientid 
> pair are the same.
> But, if we want to do a rolling restart for a new version of a consumer 
> config, the cluster will likely be in a state where it's not possible to have 
> a single config until the full rolling restart completes across all nodes.  
> This results in a consumer outage while the rolling restart is happening.
> Finally, it's especially problematic if we want to canary a new version for a 
> period before rolling to the whole cluster.
> I'm not sure why this restriction should exist (as it obviously does not 
> exist for the 'range' assignment strategy).  It seems it could be made to 
> work reasonably well with heterogenous topic selection and heterogenous 
> thread counts.  The documentation states that "The round-robin partition 
> assignor lays out all the available partitions and all the available consumer 
> threads. It then proceeds to do a round-robin assignment from partition to 
> consumer thread."
> If the assignor can "lay out all the available partitions and all the 
> available consumer threads", it should be able to uniformly assign partitions 
> to the available threads.  In each case, if a thread belongs to a consumer 
> that doesn't have that partition selected, just move to the next available 
> thread that does have the selection, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1379) Partition reassignment resets clock for time-based retention

2017-02-13 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864128#comment-15864128
 ] 

Andrew Olson commented on KAFKA-1379:
-

[~hachikuji] Jason, could you confirm if this bug has been fixed?

> Partition reassignment resets clock for time-based retention
> 
>
> Key: KAFKA-1379
> URL: https://issues.apache.org/jira/browse/KAFKA-1379
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Joel Koshy
>
> Since retention is driven off mod-times reassigned partitions will result in
> data that has been on a leader to be retained for another full retention
> cycle. E.g., if retention is seven days and you reassign partitions on the
> sixth day then those partitions will remain on the replicas for another
> seven days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-1379) Partition reassignment resets clock for time-based retention

2017-02-13 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864128#comment-15864128
 ] 

Andrew Olson edited comment on KAFKA-1379 at 2/13/17 6:43 PM:
--

[~hachikuji] Jason, could you confirm if this bug has been fixed? According to 
http://kafka.apache.org/documentation.html#upgrade_10_1_breaking it appears so.


was (Author: noslowerdna):
[~hachikuji] Jason, could you confirm if this bug has been fixed?

> Partition reassignment resets clock for time-based retention
> 
>
> Key: KAFKA-1379
> URL: https://issues.apache.org/jira/browse/KAFKA-1379
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Joel Koshy
>
> Since retention is driven off mod-times reassigned partitions will result in
> data that has been on a leader to be retained for another full retention
> cycle. E.g., if retention is seven days and you reassign partitions on the
> sixth day then those partitions will remain on the replicas for another
> seven days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1524) Implement transactional producer

2017-02-19 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873783#comment-15873783
 ] 

Andrew Olson commented on KAFKA-1524:
-

Can this Jira be closed as obsolete? It appears to have been superseded by the 
design in 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

> Implement transactional producer
> 
>
> Key: KAFKA-1524
> URL: https://issues.apache.org/jira/browse/KAFKA-1524
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1524_2014-08-18_09:39:34.patch, 
> KAFKA-1524_2014-08-20_09:14:59.patch, KAFKA-1524.patch, KAFKA-1524.patch, 
> KAFKA-1524.patch
>
>
> Implement the basic transactional producer functionality as outlined in 
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> The scope of this jira is basic functionality (i.e., to be able to begin and 
> commit or abort a transaction) without the failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-1524) Implement transactional producer

2017-02-19 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873783#comment-15873783
 ] 

Andrew Olson edited comment on KAFKA-1524 at 2/19/17 6:16 PM:
--

Can this Jira be closed as obsolete? It appears to have been superseded by the 
design in 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

Same comment applies to several other open Jiras with the "transactions" label.


was (Author: noslowerdna):
Can this Jira be closed as obsolete? It appears to have been superseded by the 
design in 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

> Implement transactional producer
> 
>
> Key: KAFKA-1524
> URL: https://issues.apache.org/jira/browse/KAFKA-1524
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1524_2014-08-18_09:39:34.patch, 
> KAFKA-1524_2014-08-20_09:14:59.patch, KAFKA-1524.patch, KAFKA-1524.patch, 
> KAFKA-1524.patch
>
>
> Implement the basic transactional producer functionality as outlined in 
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> The scope of this jira is basic functionality (i.e., to be able to begin and 
> commit or abort a transaction) without the failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3866) KerberosLogin refresh time bug and other improvements

2017-02-22 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878152#comment-15878152
 ] 

Andrew Olson commented on KAFKA-3866:
-

We saw the following error when setting a very low expiration time (-maxlife 
"30 seconds" and -maxrenewlife "90 seconds"),

{noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: 
exiting refresh thread. Check clock sync between this host and KDC - (KDC's 
clock is likely ahead of this host). Manual intervention will be required for 
this client to successfully authenticate. Exiting refresh thread. 
(org.apache.kafka.common.security.kerberos.KerberosLogin){noformat}

Would this change fix that scenario?

> KerberosLogin refresh time bug and other improvements
> -
>
> Key: KAFKA-3866
> URL: https://issues.apache.org/jira/browse/KAFKA-3866
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.0.0
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is 
> also present in our KerberosLogin class. While looking at the code, I found a 
> number of things that could be improved. More details in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-3866) KerberosLogin refresh time bug and other improvements

2017-02-22 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878152#comment-15878152
 ] 

Andrew Olson edited comment on KAFKA-3866 at 2/22/17 1:06 PM:
--

We saw the following error when setting a very low expiration time (-maxlife 
"30 seconds" and -maxrenewlife "90 seconds"),

{noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: 
exiting refresh
thread. Check clock sync between this host and KDC - (KDC's clock is likely 
ahead of this host). 
Manual intervention will be required for this client to successfully 
authenticate. Exiting refresh 
thread. (org.apache.kafka.common.security.kerberos.KerberosLogin){noformat}

Would this change fix that scenario?


was (Author: noslowerdna):
We saw the following error when setting a very low expiration time (-maxlife 
"30 seconds" and -maxrenewlife "90 seconds"),

{noformat}ERROR NextRefresh: Tue Feb 21 12:42:59 CST 2017 is in the past: 
exiting refresh thread. Check clock sync between this host and KDC - (KDC's 
clock is likely ahead of this host). Manual intervention will be required for 
this client to successfully authenticate. Exiting refresh thread. 
(org.apache.kafka.common.security.kerberos.KerberosLogin){noformat}

Would this change fix that scenario?

> KerberosLogin refresh time bug and other improvements
> -
>
> Key: KAFKA-3866
> URL: https://issues.apache.org/jira/browse/KAFKA-3866
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.0.0
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is 
> also present in our KerberosLogin class. While looking at the code, I found a 
> number of things that could be improved. More details in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-02-11 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-3228:
---

 Summary: Partition reassignment failure for brokers freshly added 
to cluster
 Key: KAFKA-3228
 URL: https://issues.apache.org/jira/browse/KAFKA-3228
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.2.1
Reporter: Andrew Olson
Assignee: Neha Narkhede


After adding about new 20 brokers to double the size of an existing production 
Kafka deployment, when attempting to rebalance partitions we were initially 
unable to reassign any partitions to 5 of the 20. The controller broker logged 
error messages like:

{noformat}
ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
reassignment of partition [TOPIC-NAME,2]
kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 
4,34,33 for partition [TOPIC-NAME,2]
to be reassigned are alive. Failing partition reassignment
at 
kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at kafka.utils.Utils$.inLock(Utils.scala:535)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at 
kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
{noformat}

We reattempted the reassignment to one of these new brokers, with the same 
result.

We also saw these messages in the controller's log. There was a "Broken pipe" 
error for each of the new brokers.

{noformat}
2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:148)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
broker id:34...
Reconnecting to broker.
java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
INFO kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread], Controller 19 connected
to id:34... for sending state change requests
{noformat}

There were no error messages in the new broker log files, just the normal 
startup logs. A jstack did not reveal anything unusual with the threads, and 
using netstat the network connection

[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-02-11 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3228:

Description: 
After adding about new 20 brokers to double the size of an existing production 
Kafka deployment, when attempting to rebalance partitions we were initially 
unable to reassign any partitions to 5 of the 20. There was no problem with the 
other 15. The controller broker logged error messages like:

{noformat}
ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
reassignment of partition [TOPIC-NAME,2]
kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 
4,34,33 for partition [TOPIC-NAME,2]
to be reassigned are alive. Failing partition reassignment
at 
kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at kafka.utils.Utils$.inLock(Utils.scala:535)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at 
kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
{noformat}

We reattempted the reassignment to one of these new brokers, with the same 
result.

We also saw these messages in the controller's log. There was a "Broken pipe" 
error for each of the new brokers.

{noformat}
2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:148)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
broker id:34...
Reconnecting to broker.
java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
INFO kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread], Controller 19 connected
to id:34... for sending state change requests
{noformat}

There were no error messages in the new broker log files, just the normal 
startup logs. A jstack did not reveal anything unusual with the threads, and 
using netstat the network connections looked normal.

We're running version 0.8.2.1. The new brokers were simultaneously started  
using a broadcast-style command. However we also had the same issue with a 
different Kafka cluster after

[jira] [Updated] (KAFKA-3228) Partition reassignment failure for brokers freshly added to cluster

2016-02-11 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3228:

Description: 
After adding about new 20 brokers to double the size of an existing production 
Kafka deployment, when attempting to rebalance partitions we were initially 
unable to reassign any partitions to 5 of the 20. There was no problem with the 
other 15. The controller broker logged error messages like:

{noformat}
ERROR kafka.controller.KafkaController: [Controller 19]: Error completing 
reassignment of partition [TOPIC-NAME,2]
kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 
4,34,33 for partition [TOPIC-NAME,2]
to be reassigned are alive. Failing partition reassignment
at 
kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197)
at kafka.utils.Utils$.inLock(Utils.scala:535)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196)
at 
kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at 
kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
{noformat}

We reattempted the reassignment to one of these new brokers, with the same 
result.

We also saw these messages in the controller's log. There was a "Broken pipe" 
error for each of the new brokers.

{noformat}
2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest...
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:148)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:103)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
WARN kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread],
Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to 
broker id:34...
Reconnecting to broker.
java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{noformat}

{noformat}
INFO kafka.controller.RequestSendThread: 
[Controller-19-to-broker-34-send-thread], Controller 19 connected
to id:34... for sending state change requests
{noformat}

There were no error messages in the new broker log files, just the normal 
startup logs. A jstack did not reveal anything unusual with the threads, and 
using netstat the network connections looked normal.

We're running version 0.8.2.1. The new brokers were simultaneously started  
using a broadcast-style command. However we also had the same issue with a 
different Kafka cluster after

[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

2016-02-24 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163413#comment-15163413
 ] 

Andrew Olson commented on KAFKA-493:


[~cosmin.marginean] You might try upgrading to 0.9.0.1 which was just released 
last week. It has a couple of bug fixes (KAFKA-3159 and KAFKA-3003) that may be 
applicable.

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.10.1.0
>
> Attachments: Kafka-2014-11-10.snapshot.zip, Kafka-sampling1.zip, 
> Kafka-sampling2.zip, Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, 
> Kafka-trace3.zip, backtraces.txt, stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=53ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=574b, id = 200030, name="Thread-1", group="main")
> THR

[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-02-25 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167424#comment-15167424
 ] 

Andrew Olson commented on KAFKA-2435:
-

[~hachikuji] The pull request for this 
(https://github.com/apache/kafka/pull/146) has been rebased. Please let me know 
if there are any concerns that need to be addressed for this enhancement to be 
accepted.

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2016-02-25 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167427#comment-15167427
 ] 

Andrew Olson commented on KAFKA-2434:
-

[~junrao] The pull request for this (https://github.com/apache/kafka/pull/145) 
has been rebased. Please let me know if there are any concerns that need to be 
addressed for this patch to be accepted.

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2172) Round-robin partition assignment strategy too restrictive

2016-02-25 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167431#comment-15167431
 ] 

Andrew Olson commented on KAFKA-2172:
-

Note that the code changed in KAFKA-2196 was relocated to the new consumer 
library (+ ported from Scala -> Java) in KAFKA-2464.

> Round-robin partition assignment strategy too restrictive
> -
>
> Key: KAFKA-2172
> URL: https://issues.apache.org/jira/browse/KAFKA-2172
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Rosenberg
>
> The round-ropin partition assignment strategy, was introduced for the 
> high-level consumer, starting with 0.8.2.1.  This appears to be a very 
> attractive feature, but it has an unfortunate restriction, which prevents it 
> from being easily utilized.  That is that it requires all consumers in the 
> consumer group have identical topic regex selectors, and that they have the 
> same number of consumer threads.
> It turns out this is not always the case for our deployments.  It's not 
> unusual to run multiple consumers within a single process (with different 
> topic selectors), or we might have multiple processes dedicated for different 
> topic subsets.  Agreed, we could change these to have separate group ids for 
> each sub topic selector (but unfortunately, that's easier said than done).  
> In several cases, we do at least have separate client.ids set for each 
> sub-consumer, so it would be incrementally better if we could at least loosen 
> the requirement such that each set of topics selected by a groupid/clientid 
> pair are the same.
> But, if we want to do a rolling restart for a new version of a consumer 
> config, the cluster will likely be in a state where it's not possible to have 
> a single config until the full rolling restart completes across all nodes.  
> This results in a consumer outage while the rolling restart is happening.
> Finally, it's especially problematic if we want to canary a new version for a 
> period before rolling to the whole cluster.
> I'm not sure why this restriction should exist (as it obviously does not 
> exist for the 'range' assignment strategy).  It seems it could be made to 
> work reasonably well with heterogenous topic selection and heterogenous 
> thread counts.  The documentation states that "The round-robin partition 
> assignor lays out all the available partitions and all the available consumer 
> threads. It then proceeds to do a round-robin assignment from partition to 
> consumer thread."
> If the assignor can "lay out all the available partitions and all the 
> available consumer threads", it should be able to uniformly assign partitions 
> to the available threads.  In each case, if a thread belongs to a consumer 
> that doesn't have that partition selected, just move to the next available 
> thread that does have the selection, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-02-26 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-3297:
---

 Summary: More optimally balanced partition assignment strategy 
(new consumer)
 Key: KAFKA-3297
 URL: https://issues.apache.org/jira/browse/KAFKA-3297
 Project: Kafka
  Issue Type: Improvement
Reporter: Andrew Olson
Assignee: Andrew Olson


While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3297:

Description: 
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the new consumer.

  was:
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.


> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>Assignee: Andrew Olson
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2435:

Description: 
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the original high-level consumer. For the new consumer, see 
KAFKA-3297.

  was:
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the original high-level consumer.


> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2435:

Description: 
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the original high-level consumer.

  was:
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.


> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3297:

Description: 
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the new consumer. For the original high-level consumer, see 
KAFKA-2435.

  was:
While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.

This JIRA addresses the new consumer.


> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>Assignee: Andrew Olson
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3297 started by Andrew Olson.
---
> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3298) Document unclean.leader.election.enable as a valid topic-level config

2016-02-26 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-3298:
---

 Summary: Document unclean.leader.election.enable as a valid 
topic-level config
 Key: KAFKA-3298
 URL: https://issues.apache.org/jira/browse/KAFKA-3298
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Andrew Olson
Priority: Minor


The http://kafka.apache.org/documentation.html#topic-config section does not 
currently include {{unclean.leader.election.enable}}. This is a valid 
topic-level configuration property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3298) Document unclean.leader.election.enable as a valid topic-level config

2016-02-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3298:

Description: 
The http://kafka.apache.org/documentation.html#topic-config section does not 
currently include {{unclean.leader.election.enable}}. That is a valid 
topic-level configuration property as demonstrated by this [1] test.

[1] 
https://github.com/apache/kafka/blob/0.9.0.1/core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala#L127

  was:The http://kafka.apache.org/documentation.html#topic-config section does 
not currently include {{unclean.leader.election.enable}}. This is a valid 
topic-level configuration property.


> Document unclean.leader.election.enable as a valid topic-level config
> -
>
> Key: KAFKA-3298
> URL: https://issues.apache.org/jira/browse/KAFKA-3298
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>    Reporter: Andrew Olson
>Priority: Minor
>
> The http://kafka.apache.org/documentation.html#topic-config section does not 
> currently include {{unclean.leader.election.enable}}. That is a valid 
> topic-level configuration property as demonstrated by this [1] test.
> [1] 
> https://github.com/apache/kafka/blob/0.9.0.1/core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala#L127



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-03-01 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3297:

Status: Patch Available  (was: In Progress)

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-03-01 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174235#comment-15174235
 ] 

Andrew Olson commented on KAFKA-2435:
-

KIP link: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-49+-+Fair+Partition+Assignment+Strategy

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-03-01 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174237#comment-15174237
 ] 

Andrew Olson commented on KAFKA-3297:
-

KIP link: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-49+-+Fair+Partition+Assignment+Strategy

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-03-07 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-3297:

Fix Version/s: 0.10.0.0

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Fix For: 0.10.0.0
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2016-03-07 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2435:

Fix Version/s: 0.10.0.0

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2016-03-07 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2434:

Fix Version/s: 0.10.0.0

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2196) remove roundrobin identical topic constraint in consumer coordinator

2015-08-10 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680204#comment-14680204
 ] 

Andrew Olson commented on KAFKA-2196:
-

Doesn't this [1] code also need to be updated?

[1] 
https://github.com/apache/kafka/blob/0.8.2.1/core/src/main/scala/kafka/consumer/PartitionAssignor.scala#L78-82

> remove roundrobin identical topic constraint in consumer coordinator
> 
>
> Key: KAFKA-2196
> URL: https://issues.apache.org/jira/browse/KAFKA-2196
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
> Attachments: KAFKA-2196.patch
>
>
> roundrobin doesn't need to make all consumers have identical topic 
> subscriptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson reassigned KAFKA-2434:
---

Assignee: Andrew Olson

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-2434:
---

 Summary: remove roundrobin identical topic constraint in consumer 
coordinator (old API)
 Key: KAFKA-2434
 URL: https://issues.apache.org/jira/browse/KAFKA-2434
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrew Olson


The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2196) remove roundrobin identical topic constraint in consumer coordinator

2015-08-14 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697189#comment-14697189
 ] 

Andrew Olson commented on KAFKA-2196:
-

> Doesn't this [1] code also need to be updated?

Opened KAFKA-2434 for removing this constraint from the old consumer API.

> remove roundrobin identical topic constraint in consumer coordinator
> 
>
> Key: KAFKA-2196
> URL: https://issues.apache.org/jira/browse/KAFKA-2196
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
> Attachments: KAFKA-2196.patch
>
>
> roundrobin doesn't need to make all consumers have identical topic 
> subscriptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697192#comment-14697192
 ] 

Andrew Olson commented on KAFKA-2434:
-

I will submit a patch for this change shortly. 

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2435) More optimally balanced partition assignment strategy

2015-08-14 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-2435:
---

 Summary: More optimally balanced partition assignment strategy
 Key: KAFKA-2435
 URL: https://issues.apache.org/jira/browse/KAFKA-2435
 Project: Kafka
  Issue Type: Improvement
Reporter: Andrew Olson
Assignee: Andrew Olson


While the roundrobin partition assignment strategy is an improvement over the 
range strategy, when the consumer topic subscriptions are not identical 
(previously disallowed but will be possible as of KAFKA-2172) it can produce 
heavily skewed assignments. As suggested 
[here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
 it would be nice to have a strategy that attempts to assign an equal number of 
partitions to each consumer in a group, regardless of how similar their 
individual topic subscriptions are. We can accomplish this by tracking the 
number of partitions assigned to each consumer, and having the partition 
assignment loop assign each partition to a consumer interested in that topic 
with the least number of partitions assigned. 

Additionally, we can optimize the distribution fairness by adjusting the 
partition assignment order:
* Topics with fewer consumers are assigned first.
* In the event of a tie for least consumers, the topic with more partitions is 
assigned first.

The general idea behind these two rules is to keep the most flexible assignment 
choices available as long as possible by starting with the most constrained 
partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697264#comment-14697264
 ] 

Andrew Olson commented on KAFKA-2434:
-

Created reviewboard https://reviews.apache.org/r/37480/diff/
 against branch origin/trunk

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2434:

Status: Patch Available  (was: Open)

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-08-14 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2434:

Attachment: KAFKA-2434.patch

> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy

2015-08-14 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697266#comment-14697266
 ] 

Andrew Olson commented on KAFKA-2435:
-

Created reviewboard https://reviews.apache.org/r/37481/diff/
 against branch origin/trunk

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2015-08-14 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2435:

Status: Patch Available  (was: Open)

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2435) More optimally balanced partition assignment strategy

2015-08-14 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2435:

Attachment: KAFKA-2435.patch

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Andrew Olson
>    Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2172) Round-robin partition assignment strategy too restrictive

2015-08-14 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697662#comment-14697662
 ] 

Andrew Olson commented on KAFKA-2172:
-

[~jjkoshy] I've implemented a new assignment algorithm similar to what Bryan 
described above that appears to work reasonably well across a wide variety of 
scenarios - see KAFKA-2435.

> Round-robin partition assignment strategy too restrictive
> -
>
> Key: KAFKA-2172
> URL: https://issues.apache.org/jira/browse/KAFKA-2172
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Rosenberg
>
> The round-ropin partition assignment strategy, was introduced for the 
> high-level consumer, starting with 0.8.2.1.  This appears to be a very 
> attractive feature, but it has an unfortunate restriction, which prevents it 
> from being easily utilized.  That is that it requires all consumers in the 
> consumer group have identical topic regex selectors, and that they have the 
> same number of consumer threads.
> It turns out this is not always the case for our deployments.  It's not 
> unusual to run multiple consumers within a single process (with different 
> topic selectors), or we might have multiple processes dedicated for different 
> topic subsets.  Agreed, we could change these to have separate group ids for 
> each sub topic selector (but unfortunately, that's easier said than done).  
> In several cases, we do at least have separate client.ids set for each 
> sub-consumer, so it would be incrementally better if we could at least loosen 
> the requirement such that each set of topics selected by a groupid/clientid 
> pair are the same.
> But, if we want to do a rolling restart for a new version of a consumer 
> config, the cluster will likely be in a state where it's not possible to have 
> a single config until the full rolling restart completes across all nodes.  
> This results in a consumer outage while the rolling restart is happening.
> Finally, it's especially problematic if we want to canary a new version for a 
> period before rolling to the whole cluster.
> I'm not sure why this restriction should exist (as it obviously does not 
> exist for the 'range' assignment strategy).  It seems it could be made to 
> work reasonably well with heterogenous topic selection and heterogenous 
> thread counts.  The documentation states that "The round-robin partition 
> assignor lays out all the available partitions and all the available consumer 
> threads. It then proceeds to do a round-robin assignment from partition to 
> consumer thread."
> If the assignor can "lay out all the available partitions and all the 
> available consumer threads", it should be able to uniformly assign partitions 
> to the available threads.  In each case, if a thread belongs to a consumer 
> that doesn't have that partition selected, just move to the next available 
> thread that does have the selection, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy

2015-08-17 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699565#comment-14699565
 ] 

Andrew Olson commented on KAFKA-2435:
-

[~becket_qin] Thanks for the info, I'll send a pull request. This new strategy 
specifically targets the case where the consumer topic subscriptions are not 
all identical, which I don't believe that KAFKA-2019 addresses -- it looks like 
it would probably reduce the potential for a large imbalance, but only if there 
is more than one thread per consumer.

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4152) Reduce severity level of metadata fetch failure logging for nonexistent topics

2016-09-12 Thread Andrew Olson (JIRA)
Andrew Olson created KAFKA-4152:
---

 Summary: Reduce severity level of metadata fetch failure logging 
for nonexistent topics
 Key: KAFKA-4152
 URL: https://issues.apache.org/jira/browse/KAFKA-4152
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Andrew Olson


If a consumer proactively subscribes to one or more topics that don't already 
exist, but are expected to exist in the near future, warnings are repeatedly 
logged by the NetworkClient throughout the consumer's lifetime until the topic 
eventually gets created like,

{noformat}
org.apache.kafka.clients.NetworkClient [WARN]
Error while fetching metadata with correlation id 1 : 
{MY.NEW.TOPIC=UNKNOWN_TOPIC_OR_PARTITION,
ANOTHER.NEW.TOPIC=UNKNOWN_TOPIC_OR_PARTITION}
{noformat}

The NetworkClient's warning logging code for metadata fetch failures is rather 
generic, but could potentially examine the reason to log at debug level for 
UNKNOWN_TOPIC_OR_PARTITION and warn for all others. As these warnings could be 
very valuable for troubleshooting in some situations a reasonable approach 
might be to remember the unknown topics that it has logged a warning for, and 
reduce the log level from warning to debug for future logging for the same 
topics for the same common cause of not (yet, presumably) existing, although 
that does introduce some undesirable complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-09-28 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531011#comment-15531011
 ] 

Andrew Olson commented on KAFKA-3990:
-

I think this Jira can be closed as a duplicate of KAFKA-2512, version and magic 
byte verification should address this. We saw the same thing with the new 
Consumer when its bootstrap.servers was accidentally set to the host:port of a 
Kafka Offset Monitor (https://github.com/quantifind/KafkaOffsetMonitor) service.

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
> Marathon
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-client

[jira] [Updated] (KAFKA-4180) Shared authentication with multiple active Kafka producers/consumers

2016-10-26 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-4180:

Summary: Shared authentication with multiple active Kafka 
producers/consumers  (was: Shared authentification with multiple actives Kafka 
producers/consumers)

> Shared authentication with multiple active Kafka producers/consumers
> 
>
> Key: KAFKA-4180
> URL: https://issues.apache.org/jira/browse/KAFKA-4180
> Project: Kafka
>  Issue Type: Bug
>  Components: producer , security
>Affects Versions: 0.10.0.1
>Reporter: Guillaume Grossetie
>Assignee: Mickael Maison
>  Labels: authentication, jaas, loginmodule, plain, producer, 
> sasl, user
>
> I'm using Kafka 0.10.0.1 with an SASL authentication on the client:
> {code:title=kafka_client_jaas.conf|borderStyle=solid}
> KafkaClient {
> org.apache.kafka.common.security.plain.PlainLoginModule required
> username="guillaume"
> password="secret";
> };
> {code}
> When using multiple Kafka producers the authentification is shared [1]. In 
> other words it's not currently possible to have multiple Kafka producers in a 
> JVM process.
> Am I missing something ? How can I have multiple active Kafka producers with 
> different credentials ?
> My use case is that I have an application that send messages to multiples 
> clusters (one cluster for logs, one cluster for metrics, one cluster for 
> business data).
> [1] 
> https://github.com/apache/kafka/blob/69ebf6f7be2fc0e471ebd5b7a166468017ff2651/clients/src/main/java/org/apache/kafka/common/security/authenticator/LoginManager.java#L35



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4362) Consumer can fail after reassignment of the offsets topic partition

2016-11-02 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628871#comment-15628871
 ] 

Andrew Olson commented on KAFKA-4362:
-

Is there documentation that advises for a COORDINATOR_NOT_AVAILABLE error when 
committing offsets the KafkaConsumer should be closed, and a new instance 
created? Or is the expectation that any CommitFailedException should be handled 
in that way? We have seen some cases where a call to poll() before retrying a 
failed commit due to ILLEGAL_GENERATION or UNKNOWN_MEMBER_ID allows the 
consumer to recover.

> Consumer can fail after reassignment of the offsets topic partition
> ---
>
> Key: KAFKA-4362
> URL: https://issues.apache.org/jira/browse/KAFKA-4362
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Joel Koshy
>Assignee: Mayuresh Gharat
>
> When a consumer offsets topic partition reassignment completes, an offset 
> commit shows this:
> {code}
> java.lang.IllegalArgumentException: Message format version for partition 100 
> not found
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633)
>  ~[kafka_2.10.jar:?]
> at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.4.jar:?]
> at 
> kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632)
>  ~[kafka_2.10.jar:?]
> at 
> ...
> {code}
> The issue is that the replica has been deleted so the 
> {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} throws this 
> exception instead which propagates as an unknown error.
> Unfortunately consumers don't respond to this and will fail their offset 
> commits.
> One workaround in the above situation is to bounce the cluster - the consumer 
> will be forced to rediscover the group coordinator.
> (Incidentally, the message incorrectly prints the number of partitions 
> instead of the actual partition.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf

2023-11-22 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson resolved KAFKA-7631.
-
Resolution: Fixed

Marking as resolved since I believe it is.

> NullPointerException when SCRAM is allowed bu ScramLoginModule is not in 
> broker's jaas.conf
> ---
>
> Key: KAFKA-7631
> URL: https://issues.apache.org/jira/browse/KAFKA-7631
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.0.0, 2.5.0
>Reporter: Andras Beni
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: KAFKA-7631.patch
>
>
> When user wants to use delegation tokens and lists {{SCRAM}} in 
> {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to 
> broker's JAAS configuration, a null pointer exception is thrown on broker 
> side and the connection is closed.
> Meaningful error message should be logged and sent back to the client.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376)
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262)
> at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127)
> at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:427)
> at kafka.network.Processor.poll(SocketServer.scala:679)
> at kafka.network.Processor.run(SocketServer.scala:584)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-9233) Kafka consumer throws undocumented IllegalStateException

2019-11-26 Thread Andrew Olson (Jira)
Andrew Olson created KAFKA-9233:
---

 Summary: Kafka consumer throws undocumented IllegalStateException
 Key: KAFKA-9233
 URL: https://issues.apache.org/jira/browse/KAFKA-9233
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 2.3.0
Reporter: Andrew Olson


If the provided collection of TopicPartition instances contains any duplicates, 
an IllegalStateException not documented in the javadoc is thrown by internal 
Java stream code when calling KafkaConsumer#beginningOffsets or 
KafkaConsumer#endOffsets.

The stack trace looks like this,

{noformat}
java.lang.IllegalStateException: Duplicate key -2
at 
java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1254)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.beginningOrEndOffset(Fetcher.java:555)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.beginningOffsets(Fetcher.java:542)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.beginningOffsets(KafkaConsumer.java:2054)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.beginningOffsets(KafkaConsumer.java:2031)
{noformat}

{noformat}
java.lang.IllegalStateException: Duplicate key -1
at 
java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1254)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.beginningOrEndOffset(Fetcher.java:559)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.endOffsets(Fetcher.java:550)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2109)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.endOffsets(KafkaConsumer.java:2081)
{noformat}

Looking at the code, it appears this may likely have been introduced by 
KAFKA-7831. The exception is not thrown in Kafka 2.2.1, with the duplicated 
TopicPartition values silently ignored. Either we should document this 
exception possibility (probably wrapping it with a Kafka exception class) 
indicating invalid client API usage, or restore the previous behavior where the 
duplicates were harmless.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-2114) Unable to change min.insync.replicas default

2015-09-09 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737079#comment-14737079
 ] 

Andrew Olson commented on KAFKA-2114:
-

Yes, that is correct.

> Unable to change min.insync.replicas default
> 
>
> Key: KAFKA-2114
> URL: https://issues.apache.org/jira/browse/KAFKA-2114
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0, 0.8.2.1
>Reporter: Bryan Baugher
>Assignee: Gwen Shapira
> Fix For: 0.8.3
>
> Attachments: KAFKA-2114.patch
>
>
> Following the comment here[1] I was unable to change the min.insync.replicas 
> default value. I tested this by setting up a 3 node cluster, wrote to a topic 
> with a replication factor of 3, using request.required.acks=-1 and setting 
> min.insync.replicas=2 on the broker's server.properties. I then shutdown 2 
> brokers but I was still able to write successfully. Only after running the 
> alter topic command setting min.insync.replicas=2 on the topic did I see 
> write failures.
> [1] - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCANZ-JHF71yqKE6%2BKKhWe2EGUJv6R3bTpoJnYck3u1-M35sobgg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2114) Unable to change min.insync.replicas default

2015-09-09 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-2114:

Affects Version/s: 0.8.2.0
   0.8.2.1

> Unable to change min.insync.replicas default
> 
>
> Key: KAFKA-2114
> URL: https://issues.apache.org/jira/browse/KAFKA-2114
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0, 0.8.2.1
>Reporter: Bryan Baugher
>Assignee: Gwen Shapira
> Fix For: 0.8.3
>
> Attachments: KAFKA-2114.patch
>
>
> Following the comment here[1] I was unable to change the min.insync.replicas 
> default value. I tested this by setting up a 3 node cluster, wrote to a topic 
> with a replication factor of 3, using request.required.acks=-1 and setting 
> min.insync.replicas=2 on the broker's server.properties. I then shutdown 2 
> brokers but I was still able to write successfully. Only after running the 
> alter topic command setting min.insync.replicas=2 on the topic did I see 
> write failures.
> [1] - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCANZ-JHF71yqKE6%2BKKhWe2EGUJv6R3bTpoJnYck3u1-M35sobgg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2434) remove roundrobin identical topic constraint in consumer coordinator (old API)

2015-09-30 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936850#comment-14936850
 ] 

Andrew Olson commented on KAFKA-2434:
-

[~junrao] Yes, this was only intended for parity with the original bug-fix 
implementation in KAFKA-2196. I noticed the distribution uniformity deficiency 
as well and consequently also submitted a patch for a new "fair" algorithm - 
see KAFKA-2435.


> remove roundrobin identical topic constraint in consumer coordinator (old API)
> --
>
> Key: KAFKA-2434
> URL: https://issues.apache.org/jira/browse/KAFKA-2434
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Attachments: KAFKA-2434.patch
>
>
> The roundrobin strategy algorithm improvement made in KAFKA-2196 should be 
> applied to the original high-level consumer API as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2189) Snappy compression of message batches less efficient in 0.8.2.1

2015-10-08 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948989#comment-14948989
 ] 

Andrew Olson commented on KAFKA-2189:
-

[~jbrosenb...@gmail.com] Yes your assumption is correct, this was only an 
efficiency/chunking issue and not any protocol incompatibility.

> Snappy compression of message batches less efficient in 0.8.2.1
> ---
>
> Key: KAFKA-2189
> URL: https://issues.apache.org/jira/browse/KAFKA-2189
> Project: Kafka
>  Issue Type: Bug
>  Components: build, compression, log
>Affects Versions: 0.8.2.1
>Reporter: Olson,Andrew
>Assignee: Ismael Juma
>Priority: Blocker
>  Labels: trivial
> Fix For: 0.9.0.0, 0.8.2.2
>
> Attachments: KAFKA-2189.patch
>
>
> We are using snappy compression and noticed a fairly substantial increase 
> (about 2.25x) in log filesystem space consumption after upgrading a Kafka 
> cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages 
> being seemingly recompressed individually (or possibly with a much smaller 
> buffer or dictionary?) instead of as a batch as sent by producers. We 
> eventually tracked down the change in compression ratio/scope to this [1] 
> commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka 
> client version does not appear to be relevant as we can reproduce this with 
> both the 0.8.1.1 and 0.8.2.1 Producer.
> Here are the log files from our troubleshooting that contain the same set of 
> 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last 
> commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue.
> {noformat}
> -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 
> /var/kafka2/f9d9b-batch-1-0/.log
> -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 
> /var/kafka2/f9d9b-batch-10-0/.log
> -rw-rw-r-- 1 kafka kafka  89645 May 12 11:45 
> /var/kafka2/f9d9b-batch-100-0/.log
> -rw-rw-r-- 1 kafka kafka  88279 May 12 11:45 
> /var/kafka2/f9d9b-batch-1000-0/.log
> -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 
> /var/kafka2/f5ab8-batch-1-0/.log
> -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 
> /var/kafka2/f5ab8-batch-10-0/.log
> -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 
> /var/kafka2/f5ab8-batch-100-0/.log
> -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 
> /var/kafka2/f5ab8-batch-1000-0/.log
> {noformat}
> [1] 
> https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-02-24 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910490#comment-13910490
 ] 

Andrew Olson commented on KAFKA-1028:
-

[~nehanarkhede], can you respond to my latest comment on the reviewboard? 
Thanks!

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-03 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-1028:


Attachment: KAFKA-1028_2014-03-03_18:48:43.patch

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-03 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918842#comment-13918842
 ] 

Andrew Olson commented on KAFKA-1028:
-

Updated reviewboard https://reviews.apache.org/r/17537/
 against branch origin/trunk

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-03 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918871#comment-13918871
 ] 

Andrew Olson commented on KAFKA-1028:
-

Patch has been updated with the suggested changes.

Note that I made one additional change which was found to be necessary after 
rebasing against the latest trunk code (detected by integration test failures). 
It appears that the recent KAFKA-1235 changes result in a Leader of -1 (none) 
and empty ISR being saved for all partitions after all brokers in the cluster 
have been shutdown using a controlled shutdown. This essentially forces an 
unclean leader election to occur when a broker is subsequently restarted.  If 
we have the configuration set to disallow unclean elections, then we have 
permanently blocked our ability to restore a leader.

To address this, I've added logic that prevents the ISR from being updated to 
an empty set for any topics that do not allow unclean election. The last 
surviving member of the ISR will be preserved in Zookeeper in the event that a 
final, lone leader broker goes offline -- allowing this previous leader to 
resume its leader role when it comes back online. The partition leader is still 
updated to -1, which is not problematic.

Please review lines 975-986 of KafkaController and let me know if this change 
makes sense.

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-17 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-1028:


Attachment: KAFKA-1028_2014-03-17_09:39:05.patch

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-17 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937865#comment-13937865
 ] 

Andrew Olson commented on KAFKA-1028:
-

Updated reviewboard https://reviews.apache.org/r/17537/
 against branch origin/trunk

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Neha Narkhede
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-25 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946673#comment-13946673
 ] 

Andrew Olson commented on KAFKA-1028:
-

[~jkreps]/[~nehanarkhede] Would it be possible for this to be included in the 
0.8.2 release?

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>    Assignee: Andrew Olson
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-25 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946686#comment-13946686
 ] 

Andrew Olson commented on KAFKA-1028:
-

Sounds good, thanks.

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>    Assignee: Andrew Olson
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-26 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948159#comment-13948159
 ] 

Andrew Olson commented on KAFKA-1028:
-

The tests included with the patch do verify consistency and non-availability, 
but they just shutdown and restart brokers rather than simulating a temporary 
network partition. A few months ago I tried to get Jepsen+Kafka up and running, 
without much success. I agree that a system test would be good to have. I'll 
take a look at that framework to see how feasible it would be to add some tests 
for this.

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Andrew Olson
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1179) createMessageStreams() in javaapi.ZookeeperConsumerConnector does not throw

2014-03-28 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950899#comment-13950899
 ] 

Andrew Olson commented on KAFKA-1179:
-

Using the Java API, calling createMessageStreams(...) multiple times using the 
same consumer connector instance appears to cause the consumer to hang in 
registerConsumerInZK(...), since a fresh timestamp value is generated for the 
consumer subscription. At a minimum, it should be clearly documented that this 
is not valid usage of the API.

The following messages are continually logged.
{code}
kafka.utils.ZkUtils$ - I wrote this conflicted ephemeral node 
[{"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019252998"}]
 at /consumers/test/ids/test_MAC-AO6517-1396019241689-631e0040 a while back in 
a different session, hence I will backoff for this node to be deleted by 
Zookeeper and retry
2014-03-28 10:08:39,086 [main] INFO  kafka.utils.ZkUtils$ - conflict in 
/consumers/test/ids/test_MAC-AO6517-1396019241689-631e0040 data: 
{"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019252998"}
 stored data: 
{"version":1,"subscription":{"topic":1},"pattern":"static","timestamp":"1396019241812"}
{code}

We're using version 0.8.1.

> createMessageStreams() in javaapi.ZookeeperConsumerConnector does not throw
> ---
>
> Key: KAFKA-1179
> URL: https://issues.apache.org/jira/browse/KAFKA-1179
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
>Reporter: Vincent Rischmann
>Assignee: Neha Narkhede
>
> In kafka.consumer.javaapi.ZookeeperConsumerConnector.scala, the 
> createMessageStreams() directly calls underlying.consume() (line 80)
> In kafka.consumer.ZookeeperConsumerConnector.scala, the 
> createMessageStreams() throws an exception if it has been called more than 
> once (line 133). 
> The javaapi should throw if it is called more than once, just like the scala 
> api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-28 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-1028:


Fix Version/s: 0.8.2

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>    Assignee: Andrew Olson
> Fix For: 0.8.2
>
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1028) per topic configuration of preference for consistency over availability

2014-03-28 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951323#comment-13951323
 ] 

Andrew Olson commented on KAFKA-1028:
-

Thanks for the info. Hopefully I'll have some time in the next week or two to 
tackle this, should be a fun project.

> per topic configuration of preference for consistency over availability
> ---
>
> Key: KAFKA-1028
> URL: https://issues.apache.org/jira/browse/KAFKA-1028
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Scott Clasen
>Assignee: Andrew Olson
> Fix For: 0.8.2
>
> Attachments: KAFKA-1028.patch, KAFKA-1028_2014-01-30_13:45:30.patch, 
> KAFKA-1028_2014-03-03_18:48:43.patch, KAFKA-1028_2014-03-17_09:39:05.patch
>
>
> As discussed with Neha on the ML.
> It should be possible to configure a topic to disallow unclean leader 
> election, thus preventing the situation where committed messages can be 
> discarded once a failed leader comes back online in a situation where it was 
> the only ISR.
> This would open kafka to additional usecases where the possibility of 
> committted messages being discarded is unacceptable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

2014-04-15 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970024#comment-13970024
 ] 

Andrew Olson commented on KAFKA-493:


Due to some concerns about partition count scalability, I setup a test that 
created a large number of idle topics (4 partitions per topic, replication 
factor = 3), periodically pausing topic creation to inspect the overall health 
of the Kafka cluster. I noticed that the CPU usage increased proportionally to 
the total number of topics/partitions.

- With 50 topics and 200 partitions per broker, CPU consumption was averaging 
about 5% of a core on each system.
- With 500 topics and 2000 partitions per broker, CPU consumption was averaging 
about 50% of a core, occasionally spiking up to 70%.
- With 1000 topics and 4000 partitions per broker, CPU consumption started to 
approach 100% of a core.

VisualVM showed that most of the time was being spent in epollWait.

The Kafka cluster for this test is using Java version 1.7.0_25 on Redhat 6.2.

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main

[jira] [Updated] (KAFKA-493) High CPU usage on inactive server

2014-04-17 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-493:
---

Attachment: Kafka-trace1.zip
Kafka-trace2.zip
Kafka-trace3.zip

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=53ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=574b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=574b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 28)
> THREAD END (id = 200021)
> THREAD END (id = 27)
>

[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

2014-04-17 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972963#comment-13972963
 ] 

Andrew Olson commented on KAFKA-493:


Repeated the previous test using YourKit for profiling, trace snapshot files 
are attached. Will post a summary of my findings later this morning. The 
epollWait does not appear to be a problem.

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=53ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=574b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=574b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREA

[jira] [Updated] (KAFKA-493) High CPU usage on inactive server

2014-04-17 Thread Andrew Olson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated KAFKA-493:
---

Attachment: Kafka-sampling1.zip
Kafka-sampling2.zip
Kafka-sampling3.zip

Also attaching sampling-based profiling data, since the tracing-based profiling 
data may be a bit skewed due to the trace overhead.

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
> Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=53ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=53ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=574b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=574b, id = 200031, name="Thread-0", group="ma

[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

2014-04-17 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973114#comment-13973114
 ] 

Andrew Olson commented on KAFKA-493:


Looks like a large amount of time is spent in the ensureTopicExists method as 
it iterates over the metadata cache map until it finds a matching topic (this 
is especially problematic (O(n²)) when processing messages that reference a 
large number of partitions, like the replicas fetch requests that are 
continually sent).

Fortunately this appears to have already been fixed in trunk/0.8.1.1 by 
KAFKA-1356 by changing the metadata cache from a Map 
[TopicAndPartition,PartitionStateInfo] to a Map[Topic, Map[Partition, 
PartitionStateInfo]].

> High CPU usage on inactive server
> -
>
> Key: KAFKA-493
> URL: https://issues.apache.org/jira/browse/KAFKA-493
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
> Fix For: 0.8.2
>
> Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
> Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=53ae, id = 24, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 25, name="RMI TCP Accept-", 
> group="system")
> THREAD START (obj=53ae, id = 26, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=53ae, id = 21, name="main", group="main")
> THREAD START (obj=53ae, id = 27, name="Thread-2", group="main")
> THREAD START (obj=53ae, id = 28, name="Thread-3", group="main")
> THREAD START (obj=53ae, id = 29, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=53ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=53ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=574b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=576e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=576d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=53ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=53ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=53ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=53ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=53ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=53ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=53ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=5899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=5899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=53ae, id = 

  1   2   >