[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-26 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513061#comment-14513061
 ] 

Jay Kreps commented on KAFKA-2132:
--

Hey [~singhashish], I think I may have caused confusion.

The current state is
  clients--depends on slf4j not log4j
  core--depends on log4j

I think the log4j in core is okay because it mostly gets run as a service 
rather than embedded library (though that does happen too).

People were very, uh, vocal about the desire to not have the client package 
depend on log4j. We can't move this appender class to the clients package 
without reintroducing the log4j dependency and incurring the wrath of the java 
logging people. :-)

I recommend we make a stand-alone module called log4j/ that has this one class. 
It is a bit silly to have a single class in its own module, but I think that 
does address the goal of this ticket. I recommend we also move the class to 
Java so people don't have the scala dependency either.

Thoughts?

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2034) sourceCompatibility not set in Kafka build.gradle

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2034:
-
Resolution: Fixed
  Reviewer: Ewen Cheslack-Postava
  Assignee: Ismael Juma
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk.

> sourceCompatibility not set in Kafka build.gradle
> -
>
> Key: KAFKA-2034
> URL: https://issues.apache.org/jira/browse/KAFKA-2034
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2.1
>Reporter: Derek Bassett
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-2034.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The build.gradle does not explicitly set the sourceCompatibility version in 
> build.gradle.  This allows kafka when built by Java 1.8 to incorrectly set 
> the wrong version of the class files.  This also would allow Java 1.8 
> features to be merged into Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513081#comment-14513081
 ] 

Neha Narkhede commented on KAFKA-1054:
--

Sorry, I missed checking back on this. [~blakesmith] your changes look good. Is 
anyone planning on addressing the changes [~ijuma] suggested? I'm wondering if 
I can push this just yet.

> Eliminate Compilation Warnings for 0.8 Final Release
> 
>
> Key: KAFKA-1054
> URL: https://issues.apache.org/jira/browse/KAFKA-1054
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1054.patch, KAFKA-1054_Mar_10_2015.patch
>
>
> Currently we have a total number of 38 warnings for source code compilation 
> of 0.8.
> 1) 3 from "Unchecked type pattern"
> 2) 6 from "Unchecked conversion"
> 3) 29 from "Deprecated Hadoop API functions"
> It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513094#comment-14513094
 ] 

Neha Narkhede commented on KAFKA-2140:
--

+1

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2140:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the awesome code cleanup, [~ijuma]. Pushed to trunk

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1351) String.format is very expensive in Scala

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513096#comment-14513096
 ] 

Neha Narkhede commented on KAFKA-1351:
--

Not sure that this is still a concern. I was hoping whoever picks this up can 
do a quick microbenchmark to see if the suggestion in the description is really 
worth a change or not.

> String.format is very expensive in Scala
> 
>
> Key: KAFKA-1351
> URL: https://issues.apache.org/jira/browse/KAFKA-1351
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7.2, 0.8.0, 0.8.1
>Reporter: Neha Narkhede
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1351.patch, KAFKA-1351_2014-04-07_18:02:18.patch, 
> KAFKA-1351_2014-04-09_15:40:11.patch
>
>
> As found in KAFKA-1350, logging is causing significant overhead in the 
> performance of a Kafka server. There are several info statements that use 
> String.format which is particularly expensive. We should investigate adding 
> our own version of String.format that merely uses string concatenation under 
> the covers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1293) Mirror maker housecleaning

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1293.
--
Resolution: Fixed

Closing based on [~mwarhaftig]'s latest comment.

> Mirror maker housecleaning
> --
>
> Key: KAFKA-1293
> URL: https://issues.apache.org/jira/browse/KAFKA-1293
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.8.1
>Reporter: Jay Kreps
>Priority: Minor
>  Labels: usability
> Attachments: KAFKA-1293.patch
>
>
> Mirror maker uses it's own convention for command-line arguments, e.g. 
> --num.producers, where everywhere else follows the unix convention like 
> --num-producers. This is annoying because when running different tools you 
> have to constantly remember whatever quirks of the person who wrote that tool.
> Mirror maker should also have a top-level wrapper script in bin/ to make tab 
> completion work and so you don't have to remember the fully qualified class 
> name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-26 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513110#comment-14513110
 ] 

Joe Stein commented on KAFKA-2132:
--

+1 - make a stand-alone module called log4j/ that has this one class, move the 
class to Java so people don't have the scala dependency

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #473

2015-04-26 Thread Apache Jenkins Server
See 

Changes:

[neha.narkhede] KAFKA-2034 sourceCompatibility not set in Kafka build.gradle; 
reviewed by Neha Narkhede and Ewen Cheslack-Postava

[nehanarkhede] KAFKA-2140 Improve code readability; reviewed by Neha Narkhede

--
[...truncated 451 lines...]
org.apache.kafka.common.record.RecordTest > testEquality[51] PASSED

org.apache.kafka.common.record.RecordTest > testFields[52] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[52] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[52] PASSED

org.apache.kafka.common.record.RecordTest > testFields[53] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[53] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[53] PASSED

org.apache.kafka.common.record.RecordTest > testFields[54] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[54] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[54] PASSED

org.apache.kafka.common.record.RecordTest > testFields[55] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[55] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[55] PASSED

org.apache.kafka.common.record.RecordTest > testFields[56] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[56] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[56] PASSED

org.apache.kafka.common.record.RecordTest > testFields[57] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[57] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[57] PASSED

org.apache.kafka.common.record.RecordTest > testFields[58] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[58] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[58] PASSED

org.apache.kafka.common.record.RecordTest > testFields[59] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[59] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[59] PASSED

org.apache.kafka.common.record.RecordTest > testFields[60] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[60] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[60] PASSED

org.apache.kafka.common.record.RecordTest > testFields[61] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[61] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[61] PASSED

org.apache.kafka.common.record.RecordTest > testFields[62] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[62] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[62] PASSED

org.apache.kafka.common.record.RecordTest > testFields[63] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[63] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[63] PASSED

org.apache.kafka.common.record.MemoryRecordsTest > testIterator[0] PASSED

org.apache.kafka.common.record.MemoryRecordsTest > testIterator[1] PASSED

org.apache.kafka.common.record.MemoryRecordsTest > testIterator[2] PASSED

org.apache.kafka.common.record.MemoryRecordsTest > testIterator[3] PASSED

org.apache.kafka.clients.ClientUtilsTest > testParseAndValidateAddresses PASSED

org.apache.kafka.clients.ClientUtilsTest > testNoPort PASSED

org.apache.kafka.clients.NetworkClientTest > testReadyAndDisconnect PASSED

org.apache.kafka.clients.NetworkClientTest > testSendToUnreadyNode PASSED

org.apache.kafka.clients.NetworkClientTest > testSimpleRequestResponse PASSED

org.apache.kafka.clients.MetadataTest > testMetadata PASSED

org.apache.kafka.clients.MetadataTest > testMetadataUpdateWaitTime PASSED

org.apache.kafka.clients.producer.KafkaProducerTest > testConstructorClose 
PASSED

org.apache.kafka.clients.producer.ProducerRecordTest > testEqualsAndHashCode 
PASSED

org.apache.kafka.clients.producer.MockProducerTest > testAutoCompleteMock PASSED

org.apache.kafka.clients.producer.MockProducerTest > testManualCompletion PASSED

org.apache.kafka.clients.producer.RecordSendTest > testTimeout PASSED

org.apache.kafka.clients.producer.RecordSendTest > testError PASSED

org.apache.kafka.clients.producer.RecordSendTest > testBlocking PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFull 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAppendLarge PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testLinger 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testPartialDrain PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testStressfulSituation PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testNextReadyCheckDelay PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testRetryBackoff PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFlush 
PASSED

org.apache.kafka.cl

[jira] [Commented] (KAFKA-1748) Decouple system test cluster resources definition from service definitions

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513111#comment-14513111
 ] 

Neha Narkhede commented on KAFKA-1748:
--

[~ewencp] With the ducktape work, I guess we can close this?

> Decouple system test cluster resources definition from service definitions
> --
>
> Key: KAFKA-1748
> URL: https://issues.apache.org/jira/browse/KAFKA-1748
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Attachments: KAFKA-1748.patch, KAFKA-1748_2014-11-03_12:04:18.patch, 
> KAFKA-1748_2014-11-14_14:54:17.patch
>
>
> Currently the system tests use JSON files that specify the set of services 
> for each test and where they should run (i.e. hostname). These currently 
> assume that you already have SSH keys setup, use the same username on the 
> host running the tests and the test cluster, don't require any additional 
> ssh/scp/rsync flags, and assume you'll always have a fixed set of compute 
> resources (or that you'll spend a lot of time editing config files).
> While we don't want a whole cluster resource manager in the system tests, a 
> bit more flexibility would make it easier to, e.g., run tests against a local 
> vagrant cluster or on dynamically allocated EC2 instances. We can separate 
> out the basic resource spec (i.e. json specifying how to access machines) 
> from the service definition (i.e. a broker should run with settings x, y, z). 
> Restricting to a very simple set of mappings (i.e. map services to hosts with 
> round robin, optionally restricting to no reuse of hosts) should keep things 
> simple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1277) Keep the summery/description when updating the RB with kafka-patch-review

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1277.
--
Resolution: Incomplete

Closing due to inactivity. 

> Keep the summery/description when updating the RB with kafka-patch-review
> -
>
> Key: KAFKA-1277
> URL: https://issues.apache.org/jira/browse/KAFKA-1277
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Manikumar Reddy
> Attachments: KAFKA-1277.patch, KAFKA-1277.patch, KAFKA-1277.patch, 
> KAFKA-1277_2014-10-04_16:39:56.patch, KAFKA-1277_2014-10-04_16:51:20.patch, 
> KAFKA-1277_2014-10-04_16:57:30.patch, KAFKA-1277_2014-10-04_17:00:37.patch, 
> KAFKA-1277_2014-10-04_17:01:43.patch, KAFKA-1277_2014-10-04_17:03:08.patch, 
> KAFKA-1277_2014-10-04_17:09:02.patch, KAFKA-1277_2014-10-05_11:04:33.patch, 
> KAFKA-1277_2014-10-05_11:09:08.patch, KAFKA-1277_2014-10-05_11:10:50.patch, 
> KAFKA-1277_2014-10-05_11:18:17.patch
>
>
> Today kafka-patch-review tool will always use a default title and description 
> if they are not specified, even when updating an existing RB. Would better 
> change to leave the current title/description as is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33342: Patch for KAFKA-2122

2015-04-26 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33342/#review81617
---

Ship it!



core/src/main/scala/kafka/controller/ControllerChannelManager.scala


Given your latest change, this comment is incorrect. I will take it out 
during the merge to save time.


- Neha Narkhede


On April 19, 2015, 7:44 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33342/
> ---
> 
> (Updated April 19, 2015, 7:44 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2122
> https://issues.apache.org/jira/browse/KAFKA-2122
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2122. Remove controller.message.queue.size Config.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/controller/ControllerChannelManager.scala 
> 97acdb23f6e95554c3e0357aa112eddfc875efbc 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> cfbbd2be550947dd2b3c8c2cab981fa08fb6d859 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 62d183248e3be4c83d2c768e762f61f92448c6a6 
>   system_test/mirror_maker_testsuite/config/server.properties 
> c6284122e3dfaa046a453511c17a19a536dc1035 
>   system_test/offset_management_testsuite/config/server.properties 
> 2b988f86052a7bb248e1c7752808cb0a336d0020 
>   
> system_test/offset_management_testsuite/testcase_7002/config/kafka_server_1.properties
>  41ec6e49272f129ab11a08aec04faf444f1c4f26 
>   
> system_test/offset_management_testsuite/testcase_7002/config/kafka_server_2.properties
>  727e23701d6c2eb08c7ddde198e042921ef058f8 
>   
> system_test/offset_management_testsuite/testcase_7002/config/kafka_server_3.properties
>  e6fbbe1e0532eb393a5b100c8b3c58be4c546ec0 
>   
> system_test/offset_management_testsuite/testcase_7002/config/kafka_server_4.properties
>  fee65bce63564a7403d311be2c27c34e17e835a7 
>   system_test/replication_testsuite/config/server.properties 
> 6becbab60e3943e56a77dc7872febb9764b2dca9 
> 
> Diff: https://reviews.apache.org/r/33342/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



[jira] [Commented] (KAFKA-1993) Enable topic deletion as default

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513123#comment-14513123
 ] 

Neha Narkhede commented on KAFKA-1993:
--

I merged KAFKA-2122. I think we can make delete topic on by default now.

> Enable topic deletion as default
> 
>
> Key: KAFKA-1993
> URL: https://issues.apache.org/jira/browse/KAFKA-1993
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-1993.patch
>
>
> Since topic deletion is now throughly tested and works as well as most Kafka 
> features, we should enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2122) Remove controller.message.queue.size Config

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2122:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk

> Remove controller.message.queue.size Config
> ---
>
> Key: KAFKA-2122
> URL: https://issues.apache.org/jira/browse/KAFKA-2122
> Project: Kafka
>  Issue Type: Bug
>Reporter: Onur Karaman
>Assignee: Sriharsha Chintalapani
> Attachments: KAFKA-2122.patch, KAFKA-2122_2015-04-19_12:44:41.patch
>
>
> A deadlock can happen during a delete topic if controller.message.queue.size 
> is overridden to a custom value. Details are here: 
> https://issues.apache.org/jira/browse/KAFKA-2046?focusedCommentId=14380776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380776
> Given that KAFKA-1993 is enabling delete topic by default, it would be unsafe 
> to simultaneously allow a configurable controller.message.queue.size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: KafkaPreCommit #86

2015-04-26 Thread Apache Jenkins Server
See 

Changes:

[nehanarkhede] KAFKA-2140 Improve code readability; reviewed by Neha Narkhede

[nehanarkhede] KAFKA-2122 Remove controller.message.queue.size Config; reviewed 
by Neha Narkhede, Jun Rao, Onur

--
[...truncated 451 lines...]
org.apache.kafka.common.record.RecordTest > testFields[33] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[33] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[33] PASSED

org.apache.kafka.common.record.RecordTest > testFields[34] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[34] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[34] PASSED

org.apache.kafka.common.record.RecordTest > testFields[35] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[35] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[35] PASSED

org.apache.kafka.common.record.RecordTest > testFields[36] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[36] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[36] PASSED

org.apache.kafka.common.record.RecordTest > testFields[37] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[37] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[37] PASSED

org.apache.kafka.common.record.RecordTest > testFields[38] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[38] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[38] PASSED

org.apache.kafka.common.record.RecordTest > testFields[39] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[39] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[39] PASSED

org.apache.kafka.common.record.RecordTest > testFields[40] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[40] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[40] PASSED

org.apache.kafka.common.record.RecordTest > testFields[41] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[41] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[41] PASSED

org.apache.kafka.common.record.RecordTest > testFields[42] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[42] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[42] PASSED

org.apache.kafka.common.record.RecordTest > testFields[43] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[43] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[43] PASSED

org.apache.kafka.common.record.RecordTest > testFields[44] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[44] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[44] PASSED

org.apache.kafka.common.record.RecordTest > testFields[45] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[45] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[45] PASSED

org.apache.kafka.common.record.RecordTest > testFields[46] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[46] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[46] PASSED

org.apache.kafka.common.record.RecordTest > testFields[47] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[47] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[47] PASSED

org.apache.kafka.common.record.RecordTest > testFields[48] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[48] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[48] PASSED

org.apache.kafka.common.record.RecordTest > testFields[49] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[49] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[49] PASSED

org.apache.kafka.common.record.RecordTest > testFields[50] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[50] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[50] PASSED

org.apache.kafka.common.record.RecordTest > testFields[51] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[51] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[51] PASSED

org.apache.kafka.common.record.RecordTest > testFields[52] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[52] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[52] PASSED

org.apache.kafka.common.record.RecordTest > testFields[53] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[53] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[53] PASSED

org.apache.kafka.common.record.RecordTest > testFields[54] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[54] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[54] PASSED

org.apache.kafka.common.record.RecordTest > testFields[55] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[55] PASSED

org.apache.kafka

[jira] [Updated] (KAFKA-1748) Decouple system test cluster resources definition from service definitions

2015-04-26 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1748:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Agreed, closing in favor of eventually providing a better system test framework.

> Decouple system test cluster resources definition from service definitions
> --
>
> Key: KAFKA-1748
> URL: https://issues.apache.org/jira/browse/KAFKA-1748
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Attachments: KAFKA-1748.patch, KAFKA-1748_2014-11-03_12:04:18.patch, 
> KAFKA-1748_2014-11-14_14:54:17.patch
>
>
> Currently the system tests use JSON files that specify the set of services 
> for each test and where they should run (i.e. hostname). These currently 
> assume that you already have SSH keys setup, use the same username on the 
> host running the tests and the test cluster, don't require any additional 
> ssh/scp/rsync flags, and assume you'll always have a fixed set of compute 
> resources (or that you'll spend a lot of time editing config files).
> While we don't want a whole cluster resource manager in the system tests, a 
> bit more flexibility would make it easier to, e.g., run tests against a local 
> vagrant cluster or on dynamically allocated EC2 instances. We can separate 
> out the basic resource spec (i.e. json specifying how to access machines) 
> from the service definition (i.e. a broker should run with settings x, y, z). 
> Restricting to a very simple set of mappings (i.e. map services to hosts with 
> round robin, optionally restricting to no reuse of hosts) should keep things 
> simple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-26 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513129#comment-14513129
 ] 

Ashish K Singh commented on KAFKA-2132:
---

Sounds good to me. Will have a patch up for review soon.

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513128#comment-14513128
 ] 

Ismael Juma commented on KAFKA-2140:


Thank you for merging. :)

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1351) String.format is very expensive in Scala

2015-04-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513131#comment-14513131
 ] 

Ismael Juma commented on KAFKA-1351:


Personally, I think the right way to go is to use string interpolation (a Scala 
2.10 feature) once we drop support for Scala 2.9. It's less error-prone (IDEs 
understand it and there are a number of lint tools that check that it's used 
correctly) and performs better. Of course, we should benchmark it before we use 
it.

> String.format is very expensive in Scala
> 
>
> Key: KAFKA-1351
> URL: https://issues.apache.org/jira/browse/KAFKA-1351
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7.2, 0.8.0, 0.8.1
>Reporter: Neha Narkhede
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1351.patch, KAFKA-1351_2014-04-07_18:02:18.patch, 
> KAFKA-1351_2014-04-09_15:40:11.patch
>
>
> As found in KAFKA-1350, logging is causing significant overhead in the 
> performance of a Kafka server. There are several info statements that use 
> String.format which is particularly expensive. We should investigate adding 
> our own version of String.format that merely uses string concatenation under 
> the covers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #474

2015-04-26 Thread Apache Jenkins Server
See 

Changes:

[nehanarkhede] KAFKA-2122 Remove controller.message.queue.size Config; reviewed 
by Neha Narkhede, Jun Rao, Onur

--
[...truncated 451 lines...]
org.apache.kafka.common.record.RecordTest > testFields[33] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[33] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[33] PASSED

org.apache.kafka.common.record.RecordTest > testFields[34] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[34] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[34] PASSED

org.apache.kafka.common.record.RecordTest > testFields[35] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[35] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[35] PASSED

org.apache.kafka.common.record.RecordTest > testFields[36] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[36] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[36] PASSED

org.apache.kafka.common.record.RecordTest > testFields[37] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[37] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[37] PASSED

org.apache.kafka.common.record.RecordTest > testFields[38] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[38] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[38] PASSED

org.apache.kafka.common.record.RecordTest > testFields[39] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[39] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[39] PASSED

org.apache.kafka.common.record.RecordTest > testFields[40] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[40] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[40] PASSED

org.apache.kafka.common.record.RecordTest > testFields[41] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[41] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[41] PASSED

org.apache.kafka.common.record.RecordTest > testFields[42] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[42] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[42] PASSED

org.apache.kafka.common.record.RecordTest > testFields[43] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[43] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[43] PASSED

org.apache.kafka.common.record.RecordTest > testFields[44] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[44] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[44] PASSED

org.apache.kafka.common.record.RecordTest > testFields[45] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[45] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[45] PASSED

org.apache.kafka.common.record.RecordTest > testFields[46] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[46] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[46] PASSED

org.apache.kafka.common.record.RecordTest > testFields[47] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[47] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[47] PASSED

org.apache.kafka.common.record.RecordTest > testFields[48] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[48] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[48] PASSED

org.apache.kafka.common.record.RecordTest > testFields[49] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[49] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[49] PASSED

org.apache.kafka.common.record.RecordTest > testFields[50] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[50] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[50] PASSED

org.apache.kafka.common.record.RecordTest > testFields[51] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[51] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[51] PASSED

org.apache.kafka.common.record.RecordTest > testFields[52] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[52] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[52] PASSED

org.apache.kafka.common.record.RecordTest > testFields[53] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[53] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[53] PASSED

org.apache.kafka.common.record.RecordTest > testFields[54] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[54] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[54] PASSED

org.apache.kafka.common.record.RecordTest > testFields[55] PASSED

org.apache.kafka.common.record.RecordTest > testChecksum[55] PASSED

org.apache.kafka.common.record.RecordTest > testEquality[55] PASSED

org.apache.kafka.common.reco

[jira] [Commented] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513138#comment-14513138
 ] 

Neha Narkhede commented on KAFKA-2029:
--

[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought, I'm not sure if a single-threaded design will 
work. It *might* be possible to end up with a single-threaded controller, 
though I can't be entirely sure. Basically, we have to think about situations 
when something has to happen in the callback and where that callback executes 
and if that callback execution should block sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 

> Improving controlled shutdown for rolling updates
> -
>
> Key: KAFKA-2029
> URL: https://issues.apache.org/jira/browse/KAFKA-2029
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>Priority: Critical
> Attachments: KAFKA-2029.patch, KAFKA-2029.patch
>
>
> Controlled shutdown as implemented currently can cause numerous problems: 
> deadlocks, local and global datalos, partitions without leader and etc. In 
> some cases the only way to restore cluster is to stop it completelly using 
> kill -9 and start again.
> Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
> queue size makes things much worse (see discussion there).
> Note 2: The problems described here can occure in any setup, but they are 
> extremly painful in setup with large brokers (36 disks, 1000+ assigned 
> partitions per broker in our case).
> Note 3: These improvements are actually workarounds and it is worth to 
> consider global refactoring of the controller (make it single thread, or even 
> get rid of it in the favour of ZK leader elections for partitions).
> The problems and improvements are:
> # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
> shutdown requests and finally considers it as failed and procedes to unclean 
> shutdow, controller got stuck for a while (holding a lock waiting for free 
> space in controller-to-broker queue). After broker starts back it receives 
> followers request and erases highwatermarks (with a message that "replica 
> does not exists" - controller hadn't yet sent a request with replica 
> assignment), then controller starts replicas on the broker it deletes all 
> local data (due to missing highwatermarks). Furthermore, controller starts 
> processing pending shutdown request and stops replicas on the broker letting 
> it in a non-functional state. Solution to the problem might be to increase 
> time broker waits for controller reponse to shutdown request, but this 
> timeout is taken from controller.socket.timeout.ms which is global for all 
> broker-controller communication and setting it to 30 minutes is dangerous. 
> *Proposed solution: introduce dedicated config parameter for this timeout 
> with a high default*.
> # If a broker gets down during controlled shutdown and did not come back 
> controller got stuck in a deadlock (one thread owns the lock and tries to add 
> message to the dead broker's queue, send thread is a infinite loop trying to 
> retry message to the dead broker, and the broker failure handler is waiting 
> for a lock). There are numerous partitions without a leader and the only way 
> out is to kill -9 the controller. *Proposed solution: add timeout for adding 
> message to broker's queue*. ControllerChannelManager.sendRequest:
> {code}
>   def sendRequest(brokerId : Int, request : RequestOrResponse, callback: 
> (RequestOrResponse) => Unit = null) {
> brokerLock synchronized {
>   val stateInfoOpt = brokerStateInfo.get(brokerId)
>   stateInfoOpt match {
> case Some(stateInfo) =>
>   // ODKL Patch: prevent infinite hang on trying to send message to a 
> dead broker.
>   // TODO: Move timeout to config
>   if (!stateInfo.messageQueue.offer((request, callback), 10, 
> TimeUnit.SECONDS)) {
> error("Timed out trying to send message t

[jira] [Updated] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2029:
-
Reviewer: Neha Narkhede

> Improving controlled shutdown for rolling updates
> -
>
> Key: KAFKA-2029
> URL: https://issues.apache.org/jira/browse/KAFKA-2029
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>Priority: Critical
> Attachments: KAFKA-2029.patch, KAFKA-2029.patch
>
>
> Controlled shutdown as implemented currently can cause numerous problems: 
> deadlocks, local and global datalos, partitions without leader and etc. In 
> some cases the only way to restore cluster is to stop it completelly using 
> kill -9 and start again.
> Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
> queue size makes things much worse (see discussion there).
> Note 2: The problems described here can occure in any setup, but they are 
> extremly painful in setup with large brokers (36 disks, 1000+ assigned 
> partitions per broker in our case).
> Note 3: These improvements are actually workarounds and it is worth to 
> consider global refactoring of the controller (make it single thread, or even 
> get rid of it in the favour of ZK leader elections for partitions).
> The problems and improvements are:
> # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
> shutdown requests and finally considers it as failed and procedes to unclean 
> shutdow, controller got stuck for a while (holding a lock waiting for free 
> space in controller-to-broker queue). After broker starts back it receives 
> followers request and erases highwatermarks (with a message that "replica 
> does not exists" - controller hadn't yet sent a request with replica 
> assignment), then controller starts replicas on the broker it deletes all 
> local data (due to missing highwatermarks). Furthermore, controller starts 
> processing pending shutdown request and stops replicas on the broker letting 
> it in a non-functional state. Solution to the problem might be to increase 
> time broker waits for controller reponse to shutdown request, but this 
> timeout is taken from controller.socket.timeout.ms which is global for all 
> broker-controller communication and setting it to 30 minutes is dangerous. 
> *Proposed solution: introduce dedicated config parameter for this timeout 
> with a high default*.
> # If a broker gets down during controlled shutdown and did not come back 
> controller got stuck in a deadlock (one thread owns the lock and tries to add 
> message to the dead broker's queue, send thread is a infinite loop trying to 
> retry message to the dead broker, and the broker failure handler is waiting 
> for a lock). There are numerous partitions without a leader and the only way 
> out is to kill -9 the controller. *Proposed solution: add timeout for adding 
> message to broker's queue*. ControllerChannelManager.sendRequest:
> {code}
>   def sendRequest(brokerId : Int, request : RequestOrResponse, callback: 
> (RequestOrResponse) => Unit = null) {
> brokerLock synchronized {
>   val stateInfoOpt = brokerStateInfo.get(brokerId)
>   stateInfoOpt match {
> case Some(stateInfo) =>
>   // ODKL Patch: prevent infinite hang on trying to send message to a 
> dead broker.
>   // TODO: Move timeout to config
>   if (!stateInfo.messageQueue.offer((request, callback), 10, 
> TimeUnit.SECONDS)) {
> error("Timed out trying to send message to broker " + 
> brokerId.toString)
> // Do not throw, as it brings controller into completely 
> non-functional state
> // "Controller to broker state change requests batch is not empty 
> while creating a new one"
> //throw new IllegalStateException("Timed out trying to send 
> message to broker " + brokerId.toString)
>   }
> case None =>
>   warn("Not sending request %s to broker %d, since it is 
> offline.".format(request, brokerId))
>   }
> }
>   }
> {code}
> # When broker which is a controler starts shut down if auto leader rebalance 
> is running it deadlocks in the end (shutdown thread owns the lock and waits 
> for rebalance thread to exit and rebalance thread wait for lock). *Proposed 
> solution: use bounded wait in rebalance thread*. KafkaController.scala:
> {code}
>   // ODKL Patch to prevent deadlocks in shutdown.
>   /**
>* Execute the given function inside the lock
>*/
>   def inLockIfRunning[T](lock: ReentrantLock)(fun: => T): T = {
> if (isRunning || lock.isHeldByCurrentThread) {
>   // TODO: Configure timeout.
>   if (!lock.tryLock(10, TimeUnit.SECONDS)) {
> thr

[jira] [Updated] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-873:

Reviewer: Neha Narkhede

> Consider replacing zkclient with curator (with zkclient-bridge)
> ---
>
> Key: KAFKA-873
> URL: https://issues.apache.org/jira/browse/KAFKA-873
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Scott Clasen
>Assignee: Grant Henke
>
> If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
> be initially a drop-in replacement
> https://github.com/Netflix/curator/wiki/ZKClient-Bridge
> With the addition of a few more props to ZkConfig, and a bit of code this 
> would open up the possibility of using ACLs in zookeeper (which arent 
> supported directly by zkclient), as well as integrating with netflix 
> exhibitor for those of us using that.
> Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2143) Replicas get ahead of leader and fail

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2143:
-
Assignee: (was: Neha Narkhede)

> Replicas get ahead of leader and fail
> -
>
> Key: KAFKA-2143
> URL: https://issues.apache.org/jira/browse/KAFKA-2143
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>
> On a cluster of 6 nodes, we recently saw a case where a single 
> under-replicated partition suddenly appeared, replication lag spiked, and 
> network IO spiked. The cluster appeared to recover eventually on its own,
> Looking at the logs, the thing which failed was partition 7 of the topic 
> {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 
> 3. Here are the interesting log lines:
> On node 3 (the leader):
> {noformat}
> [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when 
> processing fetch request for partition [background_queue,7] offset 3722949957 
> from follower with correlation id 148185816. Possible cause: Request for 
> offset 3722949957 but we only have log segments in the range 3648049863 to 
> 3722949955. (kafka.server.ReplicaManager)
> [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when 
> processing fetch request for partition [background_queue,7] offset 3722949957 
> from follower with correlation id 156007054. Possible cause: Request for 
> offset 3722949957 but we only have log segments in the range 3648049863 to 
> 3722949955. (kafka.server.ReplicaManager)
> [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: 
> Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 
> (kafka.cluster.Partition)
> {noformat}
> Note that both replicas suddenly asked for an offset *ahead* of the available 
> offsets.
> And on nodes 1 and 4 (the replicas) many occurrences of the following:
> {noformat}
> [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log 
> background_queue-7 for deletion. (kafka.log.Log) (edited)
> {noformat}
> Based on my reading, this looks like the replicas somehow got *ahead* of the 
> leader, asked for an invalid offset, got confused, and re-replicated the 
> entire topic from scratch to recover (this matches our network graphs, which 
> show 3 sending a bunch of data to 1 and 4).
> Taking a stab in the dark at the cause, there appears to be a race condition 
> where replicas can receive a new offset before the leader has committed it 
> and is ready to replicate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2082) Kafka Replication ends up in a bad state

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2082:
-
Labels: zkclient-problems  (was: )

> Kafka Replication ends up in a bad state
> 
>
> Key: KAFKA-2082
> URL: https://issues.apache.org/jira/browse/KAFKA-2082
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>Assignee: Sriharsha Chintalapani
>Priority: Critical
>  Labels: zkclient-problems
> Attachments: KAFKA-2082.patch
>
>
> While running integration tests for Sarama (the go client) we came across a 
> pattern of connection losses that reliably puts kafka into a bad state: 
> several of the brokers start spinning, chewing ~30% CPU and spamming the logs 
> with hundreds of thousands of lines like:
> {noformat}
> [2015-04-01 13:08:40,070] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111094 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,1] failed due to Leader not local for partition 
> [many_partition,1] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,070] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111094 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,6] failed due to Leader not local for partition 
> [many_partition,6] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,070] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111095 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,21] failed due to Leader not local for partition 
> [many_partition,21] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,071] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111095 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,26] failed due to Leader not local for partition 
> [many_partition,26] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,071] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111095 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,1] failed due to Leader not local for partition 
> [many_partition,1] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,071] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111095 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,6] failed due to Leader not local for partition 
> [many_partition,6] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,072] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111096 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,21] failed due to Leader not local for partition 
> [many_partition,21] on broker 9093 (kafka.server.ReplicaManager)
> [2015-04-01 13:08:40,072] WARN [Replica Manager on Broker 9093]: Fetch 
> request with correlation id 111096 from client ReplicaFetcherThread-0-9093 on 
> partition [many_partition,26] failed due to Leader not local for partition 
> [many_partition,26] on broker 9093 (kafka.server.ReplicaManager)
> {noformat}
> This can be easily and reliably reproduced using the {{toxiproxy-final}} 
> branch of https://github.com/Shopify/sarama which includes a vagrant script 
> for provisioning the appropriate cluster: 
> - {{git clone https://github.com/Shopify/sarama.git}}
> - {{git checkout test-jira-kafka-2082}}
> - {{vagrant up}}
> - {{TEST_SEED=1427917826425719059 DEBUG=true go test -v}}
> After the test finishes (it fails because the cluster ends up in a bad 
> state), you can log into the cluster machine with {{vagrant ssh}} and inspect 
> the bad nodes. The vagrant script provisions five zookeepers and five brokers 
> in {{/opt/kafka-9091/}} through {{/opt/kafka-9095/}}.
> Additional context: the test produces continually to the cluster while 
> randomly cutting and restoring zookeeper connections (all connections to 
> zookeeper are run through a simple proxy on the same vm to make this easy). 
> The majority of the time this works very well and does a good job exercising 
> our producer's retry and failover code. However, under certain patterns of 
> connection loss (the {{TEST_SEED}} in the instructions is important), kafka 
> gets confused. The test never cuts more than two connections at a time, so 
> zookeeper should always have quorum, and the topic (with three replicas) 
> should always be writable.
> Completely restarting the cluster via {{vagrant reload}} seems to put it back 
> into a sane state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2027) kafka never notifies the zookeeper client when a partition moved with due to an auto-rebalance (when auto.leader.rebalance.enable=true)

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513151#comment-14513151
 ] 

Neha Narkhede commented on KAFKA-2027:
--

Can you describe the end user behavior that you are looking for and the reason 
behind it?

> kafka never notifies the zookeeper client when a partition moved with due to 
> an auto-rebalance (when auto.leader.rebalance.enable=true)
> ---
>
> Key: KAFKA-2027
> URL: https://issues.apache.org/jira/browse/KAFKA-2027
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.1.1
> Environment: Kafka 0.8.1.1, Node.js & Mac OS
>Reporter: Sampath Reddy Lambu
>Assignee: Neha Narkhede
>Priority: Blocker
>
> I would like report an issue when auto.leader.rebalance.enable=true. Kafka 
> never sends an event/notification to its zookeeper client after preferred 
> election complete. This works fine with manual rebalance from CLI 
> (kafka-preferred-replica-election.sh).
> Initially i thought this issue was with Kafka-Node, but its not. 
> An event should be emitted from zookeeper if any partition moved while 
> preferred election.
> Im working with kafka_2.9.2-0.8.1.1 (Broker's) & Kafka-Node(Node.JS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30196: Patch for KAFKA-1886

2015-04-26 Thread Neha Narkhede


> On Feb. 7, 2015, 4:22 p.m., Neha Narkhede wrote:
> > core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala, line 295
> > 
> >
> > Why do you need the sleep here? We try to avoid blindly sleeping in 
> > Kafka tests since it almost always leads to transient test failures. 
> > Consider using TestUtils.waitUntilTrue().
> 
> Aditya Auradkar wrote:
> Thanks Neha. I missed this review comment.
> 
> I agree sleeping isn't ideal here but I don't think there is a condition 
> I can wait on to trigger this specific exception. The client has to be 
> waiting on a response from the server. I'm not even sure that this testcase 
> needs to exist in PrimitiveApiTest since it isn't testing an API. Can you 
> suggest a better place to put it, if it makes sense to keep it at all?

Yeah, if we can't make a fullproof unit test, let's remove it. We are really 
trying to reduce the number of randomly failing unit tests.


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30196/#review71556
---


On Feb. 2, 2015, 9:57 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30196/
> ---
> 
> (Updated Feb. 2, 2015, 9:57 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1886
> https://issues.apache.org/jira/browse/KAFKA-1886
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1886. SimpleConsumer should not swallow 
> ClosedByInterruptException
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/consumer/SimpleConsumer.scala 
> cbef84ac76e62768981f74e71d451f2bda995275 
>   core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala 
> aeb7a19acaefabcc161c2ee6144a56d9a8999a81 
> 
> Diff: https://reviews.apache.org/r/30196/diff/
> 
> 
> Testing
> ---
> 
> Added an integration test to PrimitiveAPITest.scala.
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



[jira] [Commented] (KAFKA-1886) SimpleConsumer swallowing ClosedByInterruptException

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513154#comment-14513154
 ] 

Neha Narkhede commented on KAFKA-1886:
--

[~aauradkar] Took a quick look at your patch again. Are you planning on fixing 
it so we can merge? Realize that the simple consumer changes are going to 
matter less as we make more progress on the new consumer.

> SimpleConsumer swallowing ClosedByInterruptException
> 
>
> Key: KAFKA-1886
> URL: https://issues.apache.org/jira/browse/KAFKA-1886
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Aditya A Auradkar
>Assignee: Aditya Auradkar
> Attachments: KAFKA-1886.patch, KAFKA-1886.patch, 
> KAFKA-1886_2015-02-02_13:57:23.patch
>
>
> This issue was originally reported by a Samza developer. I've included an 
> exchange of mine with Chris Riccomini. I'm trying to reproduce the problem on 
> my dev setup.
> From: criccomi
> Hey all,
> Samza's BrokerProxy [1] threads appear to be wedging randomly when we try to 
> interrupt its fetcher thread. I noticed that SimpleConsumer.scala catches 
> Throwable in its sendRequest method [2]. I'm wondering: if 
> blockingChannel.send/receive throws a ClosedByInterruptException
> when the thread is interrupted, what happens? It looks like sendRequest will 
> catch the exception (which I
> think clears the thread's interrupted flag), and then retries the send. If 
> the send succeeds on the retry, I think that the ClosedByInterruptException 
> exception is effectively swallowed, and the BrokerProxy will continue
> fetching messages as though its thread was never interrupted.
> Am I misunderstanding how things work?
> Cheers,
> Chris
> [1] 
> https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala#L126
> [2] 
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1907) ZkClient can block controlled shutdown indefinitely

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1907:
-
Labels: zkclient-problems  (was: )

> ZkClient can block controlled shutdown indefinitely
> ---
>
> Key: KAFKA-1907
> URL: https://issues.apache.org/jira/browse/KAFKA-1907
> Project: Kafka
>  Issue Type: Bug
>  Components: core, zkclient
>Affects Versions: 0.8.2.0
>Reporter: Ewen Cheslack-Postava
>Assignee: jaikiran pai
>  Labels: zkclient-problems
> Attachments: KAFKA-1907.patch
>
>
> There are some calls to ZkClient via ZkUtils in 
> KafkaServer.controlledShutdown() that can block indefinitely because they 
> internally call waitUntilConnected. The ZkClient API doesn't provide an 
> alternative with timeouts, so fixing this will require enforcing timeouts in 
> some other way.
> This may be a more general issue if there are any non daemon threads that 
> also call ZkUtils methods.
> Stacktrace showing the issue:
> {code}
> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition [0x6ad69000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x70a93368> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:267)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
> at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
> at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
> at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> at 
> kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:194)
> at 
> kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:269)
> at kafka.utils.Utils$.swallow(Utils.scala:172)
> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> at kafka.utils.Logging$class.swallow(Logging.scala:94)
> at kafka.utils.Utils$.swallow(Utils.scala:45)
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
> at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
> at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1887) controller error message on shutting the last broker

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1887:
-
Reviewer: Neha Narkhede

> controller error message on shutting the last broker
> 
>
> Key: KAFKA-1887
> URL: https://issues.apache.org/jira/browse/KAFKA-1887
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Assignee: Sriharsha Chintalapani
>Priority: Minor
> Fix For: 0.8.3
>
> Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch
>
>
> We always see the following error in state-change log on shutting down the 
> last broker.
> [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
> for partition [test,0] from OfflinePartition to OnlinePartition failed 
> (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
> alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
> at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> at 
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513159#comment-14513159
 ] 

Neha Narkhede commented on KAFKA-1887:
--

[~sriharsha] The patch is waiting for a comment. If you can squeeze that in, 
I'll help you merge this change. 

> controller error message on shutting the last broker
> 
>
> Key: KAFKA-1887
> URL: https://issues.apache.org/jira/browse/KAFKA-1887
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Assignee: Sriharsha Chintalapani
>Priority: Minor
> Fix For: 0.8.3
>
> Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch
>
>
> We always see the following error in state-change log on shutting down the 
> last broker.
> [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
> for partition [test,0] from OfflinePartition to OnlinePartition failed 
> (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
> alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
> at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> at 
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: Kafka 1595 remove deprecated json parser

2015-04-26 Thread ijuma
Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/55


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 30801: Patch for KAFKA-1758

2015-04-26 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30801/#review81622
---



core/src/main/scala/kafka/log/LogManager.scala


Is there a reason we are limiting this to only NumberFormatException? Seems 
like a fix that applies to all errors. 

Also, worth changing the error message to a more generic statement about 
the problem and the fix (resetting the recovery checkpoint to 0).


- Neha Narkhede


On Feb. 9, 2015, 6:37 p.m., Manikumar Reddy O wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30801/
> ---
> 
> (Updated Feb. 9, 2015, 6:37 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1758
> https://issues.apache.org/jira/browse/KAFKA-1758
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> handled NumberFormatException while reading recovery-point-offset-checkpoint 
> file
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/LogManager.scala 
> 47d250af62c1aa53d11204a332d0684fb4217c8d 
> 
> Diff: https://reviews.apache.org/r/30801/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>



[jira] [Updated] (KAFKA-1758) corrupt recovery file prevents startup

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1758:
-
Reviewer: Neha Narkhede

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>Assignee: Manikumar Reddy
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1758) corrupt recovery file prevents startup

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513186#comment-14513186
 ] 

Neha Narkhede commented on KAFKA-1758:
--

[~omkreddy] I took a quick look and left a few review comments. Should be able 
to merge once you fix those.

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>Assignee: Manikumar Reddy
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513187#comment-14513187
 ] 

Ismael Juma commented on KAFKA-2140:


It looks like something weird happened in the merge. I just checked the patch 
and it renames the file for ConsumerRebalanceFailedException. In the merged 
commit, that file was simply deleted:

https://github.com/apache/kafka/commit/ed1a548c503f041c025e00e75d338b4fc4a51f47#diff-cf49c292e07d8fdbd84469748bb34553L24

This is causing a build failure at the moment.

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2140) Improve code readability

2015-04-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2140:
---
Attachment: KAFKA-2140-fix.patch

[~nehanarkhede], I attached a patch that restores the file that was somehow 
deleted in the merged commit.

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513200#comment-14513200
 ] 

Neha Narkhede commented on KAFKA-2140:
--

Forgot to explicitly add this. Will be much better when we move everything to 
github. Looks like you are a github user. Any interest in helping Kafka moved 
over? ;)

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513202#comment-14513202
 ] 

Neha Narkhede commented on KAFKA-2140:
--

Fixed and pushed.

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513138#comment-14513138
 ] 

Neha Narkhede edited comment on KAFKA-2029 at 4/26/15 7:15 PM:
---

[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought - It *might* be possible to end up with a 
single-threaded controller. Basically, we have to think about situations when 
something has to happen in the callback and where that callback executes and if 
that callback execution should block sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 


was (Author: nehanarkhede):
[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought - It *might* be possible to end up with a 
single-threaded controller, though I can't be entirely sure. Basically, we have 
to think about situations when something has to happen in the callback and 
where that callback executes and if that callback execution should block 
sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 

> Improving controlled shutdown for rolling updates
> -
>
> Key: KAFKA-2029
> URL: https://issues.apache.org/jira/browse/KAFKA-2029
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>Priority: Critical
> Attachments: KAFKA-2029.patch, KAFKA-2029.patch
>
>
> Controlled shutdown as implemented currently can cause numerous problems: 
> deadlocks, local and global datalos, partitions without leader and etc. In 
> some cases the only way to restore cluster is to stop it completelly using 
> kill -9 and start again.
> Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
> queue size makes things much worse (see discussion there).
> Note 2: The problems described here can occure in any setup, but they are 
> extremly painful in setup with large brokers (36 disks, 1000+ assigned 
> partitions per broker in our case).
> Note 3: These improvements are actually workarounds and it is worth to 
> consider global refactoring of the controller (make it single thread, or even 
> get rid of it in the favour of ZK leader elections for partitions).
> The problems and improvements are:
> # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
> shutdown requests and finally considers it as failed and procedes to unclean 
> shutdow, controller got stuck for a while (holding a lock waiting for free 
> space in controller-to-broker queue). After broker starts back it receives 
> followers request and erases highwatermarks (with a message that "replica 
> does not exists" - controller hadn't yet sent a request with replica 
> assignment), then controller starts replicas on the broker it deletes all 
> local data (due to missing highwatermarks). Furthermore, controller starts 
> processing pending shutdown request and stops replicas on the broker letting 
> it in a non-functional state. Solution to the problem might be to increase 
> time broker waits for controller reponse to shutdown request, but this 
> timeout is taken from controller.socket.timeout.ms which is global for all 
> broker-controller communication and setting it to 30 minutes is dangerous. 
> *Proposed solution: introduce dedicated config parameter for this timeout 
> with a high default*.
> # If

[jira] [Comment Edited] (KAFKA-2029) Improving controlled shutdown for rolling updates

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513138#comment-14513138
 ] 

Neha Narkhede edited comment on KAFKA-2029 at 4/26/15 7:15 PM:
---

[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought - It *might* be possible to end up with a 
single-threaded controller, though I can't be entirely sure. Basically, we have 
to think about situations when something has to happen in the callback and 
where that callback executes and if that callback execution should block 
sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 


was (Author: nehanarkhede):
[~jkreps] That's somewhat right. When the controller was simple and we didn't 
have the new non blocking NetworkClient, the single queue based blocking I/O 
wasn't a bad place to start. Now, it is time to refactor. Though I haven't 
given this a lot of thought, I'm not sure if a single-threaded design will 
work. It *might* be possible to end up with a single-threaded controller, 
though I can't be entirely sure. Basically, we have to think about situations 
when something has to happen in the callback and where that callback executes 
and if that callback execution should block sending other commands.

[~dmitrybugaychenko] Thanks for looking into this. Some of the problems you 
listed above should be resolved with the unlimited controller to broker queue 
change. For other changes, I highly recommend we look at the new network client 
and propose a design based on that. I also think we should avoid patching the 
current controller and be really really careful about testing. We have found 
any change to the controller to introduce subtle bugs causing serious 
instability. Let's start with a design doc and agree on that. I can help review 
it. 

> Improving controlled shutdown for rolling updates
> -
>
> Key: KAFKA-2029
> URL: https://issues.apache.org/jira/browse/KAFKA-2029
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>Priority: Critical
> Attachments: KAFKA-2029.patch, KAFKA-2029.patch
>
>
> Controlled shutdown as implemented currently can cause numerous problems: 
> deadlocks, local and global datalos, partitions without leader and etc. In 
> some cases the only way to restore cluster is to stop it completelly using 
> kill -9 and start again.
> Note 1: I'm aware of KAFKA-1305, but the proposed workaround to increase 
> queue size makes things much worse (see discussion there).
> Note 2: The problems described here can occure in any setup, but they are 
> extremly painful in setup with large brokers (36 disks, 1000+ assigned 
> partitions per broker in our case).
> Note 3: These improvements are actually workarounds and it is worth to 
> consider global refactoring of the controller (make it single thread, or even 
> get rid of it in the favour of ZK leader elections for partitions).
> The problems and improvements are:
> # Controlled shutdown takes a long time (10+ minutes), broker sends multiple 
> shutdown requests and finally considers it as failed and procedes to unclean 
> shutdow, controller got stuck for a while (holding a lock waiting for free 
> space in controller-to-broker queue). After broker starts back it receives 
> followers request and erases highwatermarks (with a message that "replica 
> does not exists" - controller hadn't yet sent a request with replica 
> assignment), then controller starts replicas on the broker it deletes all 
> local data (due to missing highwatermarks). Furthermore, controller starts 
> processing pending shutdown request and stops replicas on the broker letting 
> it in a non-functional state. Solution to the problem might be to increase 
> time broker waits for controller reponse to shutdown request, but this 
> timeout is taken from controller.socket.timeout.ms which is global for all 
> broker-controller communication and setting it to 30 minutes is dangerous. 
> *Proposed solution: 

[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513203#comment-14513203
 ] 

Ismael Juma commented on KAFKA-2140:


Yes, I'd be happy to help with that. It would make my life a lot easier. :) The 
repository is already in GitHub (https://github.com/apache/kafka), so what is 
missing?

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2150) FetcherThread backoff need to grab lock before wait on condition.

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2150:
-
Reviewer: Guozhang Wang

[~guozhang] This was introduced in KAFKA-1461, so assigning to you for review 
since you reviewed that one :)

> FetcherThread backoff need to grab lock before wait on condition.
> -
>
> Key: KAFKA-2150
> URL: https://issues.apache.org/jira/browse/KAFKA-2150
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Attachments: KAFKA-2150.patch, KAFKA-2150_2015-04-25_13:14:05.patch, 
> KAFKA-2150_2015-04-25_13:18:35.patch, KAFKA-2150_2015-04-25_13:35:36.patch
>
>
> Saw the following error: 
> kafka.api.ProducerBounceTest > testBrokerFailure STANDARD_OUT
> [2015-04-25 00:40:43,997] ERROR [ReplicaFetcherThread-0-0], Error due to  
> (kafka.server.ReplicaFetcherThread:103)
> java.lang.IllegalMonitorStateException
>   at 
> java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:127)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1668)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2107)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:95)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> [2015-04-25 00:40:47,064] ERROR [ReplicaFetcherThread-0-1], Error due to  
> (kafka.server.ReplicaFetcherThread:103)
> java.lang.IllegalMonitorStateException
>   at 
> java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:127)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1239)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1668)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2107)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:95)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> We should grab the lock before waiting on the condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2105) NullPointerException in client on MetadataRequest

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2105:
-
Status: Patch Available  (was: Open)

> NullPointerException in client on MetadataRequest
> -
>
> Key: KAFKA-2105
> URL: https://issues.apache.org/jira/browse/KAFKA-2105
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.1
>Reporter: Roger Hoover
>Priority: Minor
> Attachments: guard-from-null.patch
>
>
> With the new producer, if you accidentally pass null to 
> KafkaProducer.partitionsFor(null), it will cause the IO thread to throw NPE.
> Uncaught error in kafka producer I/O thread: 
> java.lang.NullPointerException
>   at org.apache.kafka.common.utils.Utils.utf8Length(Utils.java:174)
>   at org.apache.kafka.common.protocol.types.Type$5.sizeOf(Type.java:176)
>   at 
> org.apache.kafka.common.protocol.types.ArrayOf.sizeOf(ArrayOf.java:55)
>   at org.apache.kafka.common.protocol.types.Schema.sizeOf(Schema.java:81)
>   at org.apache.kafka.common.protocol.types.Struct.sizeOf(Struct.java:218)
>   at 
> org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35)
>   at 
> org.apache.kafka.common.requests.RequestSend.(RequestSend.java:29)
>   at 
> org.apache.kafka.clients.NetworkClient.metadataRequest(NetworkClient.java:369)
>   at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:391)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:188)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1940) Initial checkout and build failing

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1940:
-
Status: Patch Available  (was: Open)

> Initial checkout and build failing
> --
>
> Key: KAFKA-1940
> URL: https://issues.apache.org/jira/browse/KAFKA-1940
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.1.2
> Environment: Groovy:   1.8.6
> Ant:  Apache Ant(TM) version 1.9.2 compiled on July 8 2013
> Ivy:  2.2.0
> JVM:  1.8.0_25 (Oracle Corporation 25.25-b02)
> OS:   Windows 7 6.1 amd64
>Reporter: Martin Lemanski
>  Labels: build
> Attachments: zinc-upgrade.patch
>
>
> when performing `gradle wrapper` and `gradlew build` as a "new" developer, I 
> get an exception: 
> {code}
> C:\development\git\kafka>gradlew build --stacktrace
> <...>
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':core:compileScala'.
> > com.typesafe.zinc.Setup.create(Lcom/typesafe/zinc/ScalaLocation;Lcom/typesafe/zinc/SbtJars;Ljava/io/File;)Lcom/typesaf
> e/zinc/Setup;
> {code}
> Details: https://gist.github.com/mlem/ddff83cc8a25b040c157
> Current Commit:
> {code}
> C:\development\git\kafka>git rev-parse --verify HEAD
> 71602de0bbf7727f498a812033027f6cbfe34eb8
> {code}
> I am evaluating kafka for my company and wanted to run some tests with it, 
> but couldn't due to this error. I know gradle can be tricky and it's not easy 
> to setup everything correct, but this kind of bugs turns possible 
> commiters/users off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513225#comment-14513225
 ] 

Neha Narkhede commented on KAFKA-2140:
--

Nice. My assumption was that (and I could be wrong) the Apache mirror needs to 
be made writeable. If this is already true, then a quick note on the mailing 
list about the transition, followed up updating the website to change the new 
instructions for contributors will complete the transition. 

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : KafkaPreCommit #87

2015-04-26 Thread Apache Jenkins Server
See 



Re: Review Request 33421: Patch for KAFKA-2114

2015-04-26 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33421/#review81634
---

Ship it!


Ship It!

- Neha Narkhede


On April 22, 2015, 12:10 a.m., Gwen Shapira wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33421/
> ---
> 
> (Updated April 22, 2015, 12:10 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2114
> https://issues.apache.org/jira/browse/KAFKA-2114
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> fixed missing configuration and added test
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaServer.scala 
> c63f4ba9d622817ea8636d4e6135fba917ce085a 
>   core/src/test/scala/unit/kafka/integration/MinIsrConfigTest.scala 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/33421/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>



Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-04-26 Thread Jun Rao
Andrii,

Thanks for the update.

For your second point, I agree that if a single AlterTopicRequest can make
multiple changes, there is no need to support the same topic included more
than once in the request.

Now about the semantics in your first question. I was thinking that we can
do the following.
a. If ReplicaAssignment is specified, we expect that this will specify the
replica assignment for all partitions in the topic. For now, we can have
the constraint that there could be more partitions than existing ones, but
can't be less. In this case, both partitions and replicas are ignored. Then
for each partition, we do one of the followings.
a1. If the partition doesn't exist, add the partition with the replica
assignment directly to the topic path in ZK.
a2. If the partition exists and the new replica assignment is not the same
as the existing one, include it in the reassign partition json. If the json
is not empty, write it to the reassignment path in ZK to trigger partition
reassignment.
b. Otherwise, if replicas is specified, generate new ReplicaAssignment for
existing partitions. If partitions is specified (assuming it's larger),
generate ReplicaAssignment for the new partitions as well. Then go back to
step a to make a decision.
c. Otherwise, if only partitions is specified, add assignments of existing
partitions to ReplicaAssignment. Generate assignments to the new partitions
and add them to ReplicaAssignment. Then go back to step a to make a
decision.

Thanks,

Jun



On Sat, Apr 25, 2015 at 7:21 AM, Andrii Biletskyi <
andrii.bilets...@stealth.ly> wrote:

> Guys,
>
> Can we come to some agreement in terms of the second item from
> the email above? This blocks me from updating and uploading the
> patch. Also the new schedule for the weekly calls doesn't work very
> well for me - it's 1 am in my timezone :) - so I'd rather we confirm
> everything that is possible by email.
>
> Thanks,
> Andrii Biletskyi
>
> On Wed, Apr 22, 2015 at 5:50 PM, Andrii Biletskyi <
> andrii.bilets...@stealth.ly> wrote:
>
> > As said above, I spent some time thinking about AlterTopicRequest
> > semantics and batching.
> >
> > Firstly, about AlterTopicRequest. Our goal here is to see whether we
> > can suggest some simple semantics and at the same time let users
> > change different things in one instruction (hereinafter instruction - is
> > one of the entries in batch request).
> > We can resolve arguments according to this schema:
> > 1) If ReplicaAsignment is specified:
> > it's a reassign partitions request
> > 2) If either Partitions or ReplicationFactor is specified:
> >a) If Partitions specified - this is increase partitions case
> >b) If ReplicationFactor is specified - this means we need to
> > automatically
> >regenerate replica assignment and treat it as reassign partitions
> > request
> > Note: this algorithm is a bit inconsistent with the CreateTopicRequest -
> > with
> > ReplicaAssignment specified there user can implicitly define Partitions
> > and
> > ReplicationFactor, in AlterTopicRequest those are completely different
> > things,
> > i.e. you can't include new partitions to the ReplicaAssignment to
> > implicitly ask
> > controller to increase partitions - controller will simply return
> > InvalidReplicaAssignment,
> > because you included unknown partitions.
> >
> > Secondly, multiple instructions for one topic in batch request. I have a
> > feeling
> > it becomes a really big mess now, so suggestions are highly appreciated
> > here!
> > Our goal is to consider whether we can let users add multiple
> instructions
> > for one topic in one batch but at the same time make it transparent
> enough
> > so
> > we can support blocking on request completion, for that we need to
> analyze
> > from the request what is the final expected state of the topic.
> > And the latter one seems to me a tough issue.
> > Consider the following AlterTopicRequest:
> > [1) topic1: change ReplicationFactor from 2 to 3,
> >  2) topic1: change ReplicaAssignment (taking into account RF is 3 now),
> >  3) topic2: change ReplicaAssignment (just to include multiple topics)
> >  4) topic1: change ReplicationFactor from 3 to 1,
> >  5) topic1: change ReplicaAssignment again (taking into account RF is 1
> > now)
> > ]
> > As we discussed earlier, controller will handle it as alter-topic command
> > and
> > reassign-partitions. First of all, it will scan all ReplicaAssignment and
> > assembly
> > those to one json to create admin path /reassign_partitions once needed.
> > Now, user would expect we execute instruction sequentially, but we can't
> > do it
> > because only one reassign-partitions procedure can be in progress - when
> > should we trigger reassign-partition - after 1) or after 4)? And what
> > about topic2 -
> > we will break the order, but it was supposed we execute instructions
> > sequentially.
> > Overall, the logic seems to be very sophisticated, which is a bad sign.
> > Conceptually,
> > I think the root pro

[jira] [Updated] (KAFKA-2114) Unable to change min.insync.replicas default

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2114:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to trunk. Thanks Gwen!

> Unable to change min.insync.replicas default
> 
>
> Key: KAFKA-2114
> URL: https://issues.apache.org/jira/browse/KAFKA-2114
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bryan Baugher
>Assignee: Gwen Shapira
> Fix For: 0.8.2.1
>
> Attachments: KAFKA-2114.patch
>
>
> Following the comment here[1] I was unable to change the min.insync.replicas 
> default value. I tested this by setting up a 3 node cluster, wrote to a topic 
> with a replication factor of 3, using request.required.acks=-1 and setting 
> min.insync.replicas=2 on the broker's server.properties. I then shutdown 2 
> brokers but I was still able to write successfully. Only after running the 
> alter topic command setting min.insync.replicas=2 on the topic did I see 
> write failures.
> [1] - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCANZ-JHF71yqKE6%2BKKhWe2EGUJv6R3bTpoJnYck3u1-M35sobgg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #475

2015-04-26 Thread Apache Jenkins Server
See 

Changes:

[nehanarkhede] KAFKA-2140 follow up, checking in newly renamed file 
ConsumerRebalanceFailedException

[nehanarkhede] KAFKA-2128 kafka.Kafka should return non-zero exit code when 
caught exception; reviewed by Neha

--
[...truncated 986 lines...]
kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testDoublyLinkedList PASSED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.consumer.TopicFilterTest > testWhitelists PASSED

kafka.consumer.TopicFilterTest > testBlacklists PASSED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson PASSED

kafka.consumer.MetricsTest > testMetricsLeak PASSED

kafka.consumer.MetricsTest > testMetricsReporterAfterDeletingTopic PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testCompression PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testCompressionSetConsumption 
PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testConsumerDecoder PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testLeaderSelectionForPartition 
PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testConsumerRebalanceListener 
PASSED

kafka.consumer.ConsumerIteratorTest > 
testConsumerIteratorDeduplicationDeepIterator PASSED

kafka.consumer.ConsumerIteratorTest > testConsumerIteratorDecodingFailure PASSED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
PASSED

kafka.server.LogOffsetTest > testGetOffsetsForUnknownTopic PASSED

kafka.server.LogOffsetTest > testGetOffsetsBeforeLatestTime PASSED

kafka.server.LogOffsetTest > testEmptyLogsGetOffsets PASSED

kafka.server.LogOffsetTest > testGetOffsetsBeforeNow PASSED

kafka.server.LogOffsetTest > testGetOffsetsBeforeEarliestTime PASSED

kafka.server.HighwatermarkPersistenceTest > 
testHighWatermarkPersistenceSinglePartition PASSED

kafka.server.HighwatermarkPersistenceTest > 
testHighWatermarkPersistenceMultiplePartitions PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeHoursProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeMinutesProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeMsProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeNoConfigProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeBothMinutesAndHoursProvided 
PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeBothMinutesAndMsProvided 
PASSED

kafka.server.KafkaConfigTest > testLogRetentionUnlimited PASSED

kafka.server.KafkaConfigTest > testLogRetentionValid PASSED

kafka.server.KafkaConfigTest > testAdvertiseDefaults PASSED

kafka.server.KafkaConfigTest > testAdvertiseConfigured PASSED

kafka.server.KafkaConfigTest > testDuplicateListeners PASSED

kafka.server.KafkaConfigTest > testBadListenerProtocol PASSED

kafka.server.KafkaConfigTest > testListenerDefaults PASSED

kafka.server.KafkaConfigTest > testVersionConfiguration PASSED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault PASSED

kafka.server.KafkaConfigTest > testUncleanElectionDisabled PASSED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled PASSED

kafka.server.KafkaConfigTest > testUncleanElectionInvalid PASSED

kafka.server.KafkaConfigTest > testLogRollTimeMsProvided PASSED

kafka.server.KafkaConfigTest > testLogRollTimeBothMsAndHoursProvided PASSED

kafka.server.KafkaConfigTest > testLogRollTimeNoConfigProvided PASSED

kafka.server.KafkaConfigTest > testDefaultCompressionType PASSED

kafka.server.KafkaConfigTest > testValidCompressionType PASSED

kafka.server.KafkaConfigTest > testInvalidCompressionType PASSED

kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresSingleLogSegment PASSED

kafka.server.LogRecoveryTest > testHWCheckpointWithFailuresSingleLogSegment 
PASSED

kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLo

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-26 Thread Jun Rao
A few more minor comments.

100. To make it clear, perhaps we should rename the resource "group" to
consumer-group. We can probably make the same change in CLI as well so that
it's not confused with user group.

101. Currently, create is only at the cluster level. Should it also be at
topic level? For example, perhaps it's useful to allow only user X to
create topic X.

Thanks,

Jun


On Sun, Apr 26, 2015 at 12:36 AM, Gwen Shapira 
wrote:

> Thanks for clarifying, Parth. I think you are taking the right approach
> here.
>
> On Fri, Apr 24, 2015 at 11:46 AM, Parth Brahmbhatt
>  wrote:
> > Sorry Gwen, completely misunderstood the question :-).
> >
> > * Does everyone have the privilege to create a new Group and use it to
> > consume from Topics he's already privileged on?
> > Yes in current proposal. I did not see an API to create group
> but if you
> > have a READ permission on a TOPIC and WRITE permission on that Group you
> > are free to join and consume.
> >
> >
> > * Will the CLI tool be used to manage group membership too?
> > Yes and I think that means I need to add ―group. Updating the
> KIP. Thanks
> > for pointing this out.
> >
> > * Groups are kind of ephemeral, right? If all consumers in the group
> > disconnect the group is gone, AFAIK. Do we preserve the ACLs? Or do we
> > treat the new group as completely new resource? Can we create ACLs
> > before the group exists, in anticipation of it getting created?
> > I have considered any auto delete and auto create as out of
> scope for the
> > first release. So Right now I was going with preserving the acls. Do you
> > see any issues with this? Auto deleting would mean authorizer will now
> > have to get into implementation details of kafka which I was trying to
> > avoid.
> >
> > Thanks
> > Parth
> >
> > On 4/24/15, 11:33 AM, "Gwen Shapira"  wrote:
> >
> >>We are not talking about same Groups :)
> >>
> >>I meant, Groups of consumers (which KIP-11 lists as a separate
> >>resource in the Privilege table)
> >>
> >>On Fri, Apr 24, 2015 at 11:31 AM, Parth Brahmbhatt
> >> wrote:
> >>> I see Groups as something we can add incrementally in the current
> model.
> >>> The acls take principalType: name so groups can be represented as
> group:
> >>> groupName. We are not managing group memberships anywhere in kafka and
> I
> >>> don’t see the need to do so.
> >>>
> >>> So for a topic1 using the CLI an admin can add an acl to grant access
> to
> >>> group:kafka-test-users.
> >>>
> >>> The authorizer implementation can have a plugin to map authenticated
> >>>user
> >>> to groups ( This is how hadoop and storm works). The plugin could be
> >>> mapping user to linux/ldap/active directory groups but that is again
> >>>upto
> >>> the implementation.
> >>>
> >>> What we are offering is an interface that is extensible so these
> >>>features
> >>> can be added incrementally. I can add support for this in the first
> >>> release but don’t necessarily see why this would be absolute necessity.
> >>>
> >>> Thanks
> >>> Parth
> >>>
> >>> On 4/24/15, 11:00 AM, "Gwen Shapira"  wrote:
> >>>
> Thanks.
> 
> One more thing I'm missing in the KIP is details on the Group resource
> (I think we discussed this and it was just not fully updated):
> 
> * Does everyone have the privilege to create a new Group and use it to
> consume from Topics he's already privileged on?
> * Will the CLI tool be used to manage group membership too?
> * Groups are kind of ephemeral, right? If all consumers in the group
> disconnect the group is gone, AFAIK. Do we preserve the ACLs? Or do we
> treat the new group as completely new resource? Can we create ACLs
> before the group exists, in anticipation of it getting created?
> 
> Its all small details, but it will be difficult to implement KIP-11
> without knowing the answers :)
> 
> Gwen
> 
> 
> On Fri, Apr 24, 2015 at 9:58 AM, Parth Brahmbhatt
>  wrote:
> > You are right, moved it to the default implementation section.
> >
> > Thanks
> > Parth
> >
> > On 4/24/15, 9:52 AM, "Gwen Shapira"  wrote:
> >
> >>Sample ACL JSON and Zookeeper is in public API, but I thought it is
> >>part of DefaultAuthorizer (Since Sentry and Argus won't be using
> >>Zookeeper).
> >>Am I wrong? Or is it the KIP?
> >>
> >>On Fri, Apr 24, 2015 at 9:49 AM, Parth Brahmbhatt
> >> wrote:
> >>> Thanks for clarifying Gwen, KIP updated.
> >>>
> >>> I tried to make the distinction by creating a section for all
> public
> >>>APIs
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorizat
> >>>io
> >>>n+
> >>>In
> >>> terface#KIP-11-AuthorizationInterface-PublicInterfacesandclasses
> >>>
> >>> Let me know if you think there is a better way to reflect this.
> >>>
> >>> Thanks
> >>> Parth
> >>>
> >>> On 4/24/15, 9:37 AM, "Gwen Shapira"  wrote

[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513265#comment-14513265
 ] 

Neha Narkhede commented on KAFKA-1367:
--

[~jkreps] Yup, issue still exists and the solution I still recommend is to have 
the controller register watches and know the latest ISR for all partitions. 
This change isn't big if someone wants to take a stab. 

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Kafka-trunk #476

2015-04-26 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-04-26 Thread Andrii Biletskyi
Jun,

I like your approach to AlterTopicReques semantics! Sounds like
we linearize all request fields to ReplicaAssignment - I will definitely
try this out to ensure there are no other pitfalls.

With regards to multiple instructions in one batch per topic. For me
this sounds reasonable too. We discussed last time that it's pretty
strange we give users schema that supports batching and at the
same time introduce restrictions to the way batching can be used
(in this case - only one instruction per topic). But now, when we give
users everything they need to avoid such misleading use cases (if
we implement the previous item - user will be able to specify/change
all fields in one instruction) - it might be a good justification to
prohibit
serving such requests.

Any objections?

Thanks,
Andrii BIletskyi



On Sun, Apr 26, 2015 at 11:00 PM, Jun Rao  wrote:

> Andrii,
>
> Thanks for the update.
>
> For your second point, I agree that if a single AlterTopicRequest can make
> multiple changes, there is no need to support the same topic included more
> than once in the request.
>
> Now about the semantics in your first question. I was thinking that we can
> do the following.
> a. If ReplicaAssignment is specified, we expect that this will specify the
> replica assignment for all partitions in the topic. For now, we can have
> the constraint that there could be more partitions than existing ones, but
> can't be less. In this case, both partitions and replicas are ignored. Then
> for each partition, we do one of the followings.
> a1. If the partition doesn't exist, add the partition with the replica
> assignment directly to the topic path in ZK.
> a2. If the partition exists and the new replica assignment is not the same
> as the existing one, include it in the reassign partition json. If the json
> is not empty, write it to the reassignment path in ZK to trigger partition
> reassignment.
> b. Otherwise, if replicas is specified, generate new ReplicaAssignment for
> existing partitions. If partitions is specified (assuming it's larger),
> generate ReplicaAssignment for the new partitions as well. Then go back to
> step a to make a decision.
> c. Otherwise, if only partitions is specified, add assignments of existing
> partitions to ReplicaAssignment. Generate assignments to the new partitions
> and add them to ReplicaAssignment. Then go back to step a to make a
> decision.
>
> Thanks,
>
> Jun
>
>
>
> On Sat, Apr 25, 2015 at 7:21 AM, Andrii Biletskyi <
> andrii.bilets...@stealth.ly> wrote:
>
> > Guys,
> >
> > Can we come to some agreement in terms of the second item from
> > the email above? This blocks me from updating and uploading the
> > patch. Also the new schedule for the weekly calls doesn't work very
> > well for me - it's 1 am in my timezone :) - so I'd rather we confirm
> > everything that is possible by email.
> >
> > Thanks,
> > Andrii Biletskyi
> >
> > On Wed, Apr 22, 2015 at 5:50 PM, Andrii Biletskyi <
> > andrii.bilets...@stealth.ly> wrote:
> >
> > > As said above, I spent some time thinking about AlterTopicRequest
> > > semantics and batching.
> > >
> > > Firstly, about AlterTopicRequest. Our goal here is to see whether we
> > > can suggest some simple semantics and at the same time let users
> > > change different things in one instruction (hereinafter instruction -
> is
> > > one of the entries in batch request).
> > > We can resolve arguments according to this schema:
> > > 1) If ReplicaAsignment is specified:
> > > it's a reassign partitions request
> > > 2) If either Partitions or ReplicationFactor is specified:
> > >a) If Partitions specified - this is increase partitions case
> > >b) If ReplicationFactor is specified - this means we need to
> > > automatically
> > >regenerate replica assignment and treat it as reassign partitions
> > > request
> > > Note: this algorithm is a bit inconsistent with the CreateTopicRequest
> -
> > > with
> > > ReplicaAssignment specified there user can implicitly define Partitions
> > > and
> > > ReplicationFactor, in AlterTopicRequest those are completely different
> > > things,
> > > i.e. you can't include new partitions to the ReplicaAssignment to
> > > implicitly ask
> > > controller to increase partitions - controller will simply return
> > > InvalidReplicaAssignment,
> > > because you included unknown partitions.
> > >
> > > Secondly, multiple instructions for one topic in batch request. I have
> a
> > > feeling
> > > it becomes a really big mess now, so suggestions are highly appreciated
> > > here!
> > > Our goal is to consider whether we can let users add multiple
> > instructions
> > > for one topic in one batch but at the same time make it transparent
> > enough
> > > so
> > > we can support blocking on request completion, for that we need to
> > analyze
> > > from the request what is the final expected state of the topic.
> > > And the latter one seems to me a tough issue.
> > > Consider the following AlterTopicRequest:
> > > [1) topic1: 

[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2015-04-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-1054:
---
Attachment: KAFKA-1054-20150426-V1.patch

This is a rebased patch with an additional commit for fixing Scala 2.11 
warnings.

> Eliminate Compilation Warnings for 0.8 Final Release
> 
>
> Key: KAFKA-1054
> URL: https://issues.apache.org/jira/browse/KAFKA-1054
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1054-20150426-V1.patch, KAFKA-1054.patch, 
> KAFKA-1054_Mar_10_2015.patch
>
>
> Currently we have a total number of 38 warnings for source code compilation 
> of 0.8.
> 1) 3 from "Unchecked type pattern"
> 2) 6 from "Unchecked conversion"
> 3) 29 from "Deprecated Hadoop API functions"
> It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release

2015-04-26 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-1054:
---
Attachment: KAFKA-1054-20150426-V2.patch

This is the same as `KAFKA-1054-20150426-V1.patch`, but commit "Removed code 
with no discernable effect that was causing warnings" has been removed.

I am not sure if that is a safe change as the calls that have been removed may 
potentially trigger side-effects (e.g. 
`newMeter("UncleanLeaderElectionsPerSec", "elections", TimeUnit.SECONDS)`).

> Eliminate Compilation Warnings for 0.8 Final Release
> 
>
> Key: KAFKA-1054
> URL: https://issues.apache.org/jira/browse/KAFKA-1054
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1054-20150426-V1.patch, 
> KAFKA-1054-20150426-V2.patch, KAFKA-1054.patch, KAFKA-1054_Mar_10_2015.patch
>
>
> Currently we have a total number of 38 warnings for source code compilation 
> of 0.8.
> 1) 3 from "Unchecked type pattern"
> 2) 6 from "Unchecked conversion"
> 3) 29 from "Deprecated Hadoop API functions"
> It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2108) Node deleted all data and re-sync from replicas after attempted upgrade from 0.8.1.1 to 0.8.2.0

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-2108.
--
Resolution: Incomplete

Closing due to inactivity. [~slimunholyone], [~becket_qin] feel free to reopen 
if you have new findings.

> Node deleted all data and re-sync from replicas after attempted upgrade from 
> 0.8.1.1 to 0.8.2.0
> ---
>
> Key: KAFKA-2108
> URL: https://issues.apache.org/jira/browse/KAFKA-2108
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Thunder Stumpges
>
> Per [email 
> thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3CCAA%2BBczTUBqg1-tpcUjwfZgZYZyOXC-Myuhd_2EaGkeKWkrCVUQ%40mail.gmail.com%3E]
>  in user group. 
> We ran into an issue in an attempt to perform a rolling upgrade from 0.8.1.1 
> to 0.8.2.0 (we should have had 0.8.2.1 but got the wrong binaries 
> accidentally). 
> In shutting down the first node, it failed a clean controlled shutdown due to 
> one corrupt topic. The message in server.log was:
> {noformat}
> [2015-03-31 10:21:46,250] INFO [Kafka Server 6], Remaining partitions to 
> move: [__samza_checkpoint_ver_1_for_usersessions_1,0] 
> (kafka.server.KafkaServer)
> [2015-03-31 10:21:46,250] INFO [Kafka Server 6], Error code from controller: 
> 0 (kafka.server.KafkaServer)
> {noformat}
> And related message in state-change.log:
> {noformat}
> [2015-03-31 10:21:42,622] TRACE Controller 6 epoch 23 started leader election 
> for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
> (state.change.logger)
> [2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 encountered error while 
> electing leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
> due to: LeaderAndIsr information doesn't exist for partition 
> [__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state. 
> (state.change.logger)
> [2015-03-31 10:21:42,623] TRACE Controller 6 epoch 23 received response 
> correlationId 2360 for a request sent to broker id:8,host:xxx,port:9092 
> (state.change.logger) 
> [2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 initiated state change 
> for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] from 
> OnlinePartition to OnlinePartition failed (state.change.logger)
> kafka.common.StateChangeFailedException: encountered error while electing 
> leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] due to: 
> LeaderAndIsr information doesn't exist for partition 
> [__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state.
>   at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:360)
>   at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:187)
>   at 
> kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125)
>   at 
> kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:86)
>   at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:257)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:253)
>   at scala.Option.foreach(Option.scala:197)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply$mcV$sp(KafkaController.scala:253)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
>   at kafka.utils.Utils$.inLock(Utils.scala:538)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:252)
>   at 
> kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:249)
>   at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
>   at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
>   at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
>   at 
> kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:249)
>   at 
> kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:264)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:192)
>   at kafka.server.KafkaRequestHandler.run(K

[jira] [Commented] (KAFKA-1737) Document required ZkSerializer for ZkClient used with AdminUtils

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513312#comment-14513312
 ] 

Neha Narkhede commented on KAFKA-1737:
--

[~vivekpm] Makes sense. Would you like to upload what you have?

> Document required ZkSerializer for ZkClient used with AdminUtils
> 
>
> Key: KAFKA-1737
> URL: https://issues.apache.org/jira/browse/KAFKA-1737
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Assignee: Vivek Madani
>Priority: Minor
> Attachments: KAFKA-1737.patch
>
>
> {{ZkClient}} instances passed to {{AdminUtils}} calls must have 
> {{kafka.utils.ZKStringSerializer}} set as {{ZkSerializer}}. Otherwise 
> commands executed via {{AdminUtils}} may not be seen/recognizable to broker, 
> producer or consumer. E.g. producer (with auto topic creation turned off) 
> will not be able to send messages to a topic created via {{AdminUtils}}, it 
> will throw {{UnknownTopicOrPartitionException}}.
> Please consider at least documenting this requirement in {{AdminUtils}} 
> scaladoc.
> For more info see [related discussion on Kafka user mailing 
> list|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAAUywg-oihNiXuQRYeS%3D8Z3ymsmEHo6ghLs%3Dru4nbm%2BdHVz6TA%40mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2103) kafka.producer.AsyncProducerTest failure.

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513316#comment-14513316
 ] 

Neha Narkhede commented on KAFKA-2103:
--

[~becket_qin] Is this still an issue? If so, can we make this a sub task of the 
umbrella JIRA for unit test failures that [~guozhang] created.

> kafka.producer.AsyncProducerTest failure.
> -
>
> Key: KAFKA-2103
> URL: https://issues.apache.org/jira/browse/KAFKA-2103
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>
> I saw this test consistently failing on trunk.
> The recent changes are KAFKA-2099, KAFKA-1926, KAFKA-1809.
> kafka.producer.AsyncProducerTest > testNoBroker FAILED
> org.scalatest.junit.JUnitTestFailedError: Should fail with 
> FailedToSendMessageException
> at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
> at 
> org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149)
> at org.scalatest.Assertions$class.fail(Assertions.scala:711)
> at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149)
> at 
> kafka.producer.AsyncProducerTest.testNoBroker(AsyncProducerTest.scala:300)
> kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED
> kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED
> kafka.producer.AsyncProducerTest > testFailedSendRetryLogic FAILED
> kafka.common.FailedToSendMessageException: Failed to send messages after 
> 3 tries.
> at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:91)
> at 
> kafka.producer.AsyncProducerTest.testFailedSendRetryLogic(AsyncProducerTest.scala:415)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1342:
-
Labels: newbie++  (was: )

> Slow controlled shutdowns can result in stale shutdown requests
> ---
>
> Key: KAFKA-1342
> URL: https://issues.apache.org/jira/browse/KAFKA-1342
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Joel Koshy
>Priority: Blocker
>  Labels: newbie++
> Fix For: 0.9.0
>
>
> I don't think this is a bug introduced in 0.8.1., but triggered by the fact
> that controlled shutdown seems to have become slower in 0.8.1 (will file a
> separate ticket to investigate that). When doing a rolling bounce, it is
> possible for a bounced broker to stop all its replica fetchers since the
> previous PID's shutdown requests are still being shutdown.
> - 515 is the controller
> - Controlled shutdown initiated for 503
> - Controller starts controlled shutdown for 503
> - The controlled shutdown takes a long time in moving leaders and moving
>   follower replicas on 503 to the offline state.
> - So 503's read from the shutdown channel times out and a new channel is
>   created. It issues another shutdown request.  This request (since it is a
>   new channel) is accepted at the controller's socket server but then waits
>   on the broker shutdown lock held by the previous controlled shutdown which
>   is still in progress.
> - The above step repeats for the remaining retries (six more requests).
> - 503 hits SocketTimeout exception on reading the response of the last
>   shutdown request and proceeds to do an unclean shutdown.
> - The controller's onBrokerFailure call-back fires and moves 503's replicas
>   to offline (not too important in this sequence).
> - 503 is brought back up.
> - The controller's onBrokerStartup call-back fires and moves its replicas
>   (and partitions) to online state. 503 starts its replica fetchers.
> - Unfortunately, the (phantom) shutdown requests are still being handled and
>   the controller sends StopReplica requests to 503.
> - The first shutdown request finally finishes (after 76 minutes in my case!).
> - The remaining shutdown requests also execute and do the same thing (sends
>   StopReplica requests for all partitions to
>   503).
> - The remaining requests complete quickly because they end up not having to
>   touch zookeeper paths - no leaders left on the broker and no need to
>   shrink ISR in zookeeper since it has already been done by the first
>   shutdown request.
> - So in the end-state 503 is up, but effectively idle due to the previous
>   PID's shutdown requests.
> There are some obvious fixes that can be made to controlled shutdown to help
> address the above issue. E.g., we don't really need to move follower
> partitions to Offline. We did that as an "optimization" so the broker falls
> out of ISR sooner - which is helpful when producers set required.acks to -1.
> However it adds a lot of latency to controlled shutdown. Also, (more
> importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2100) Client Error doesn't preserve or display original server error code when it is an unknown code

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2100:
-
Component/s: clients

> Client Error doesn't preserve or display original server error code when it 
> is an unknown code
> --
>
> Key: KAFKA-2100
> URL: https://issues.apache.org/jira/browse/KAFKA-2100
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: newbie
>
> When the java client receives an unfamiliar error code, it translates it into 
> UNKNOWN(-1, new UnknownServerException("The server experienced an unexpected 
> error when processing the request"))
> This completely loses the original code, which makes troubleshooting from the 
> client impossible. 
> Will be better to preserve the original code and write it to the log when 
> logging the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2101) Metric metadata-age is reset on a failed update

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513318#comment-14513318
 ] 

Neha Narkhede commented on KAFKA-2101:
--

[~timbrooks] To clarify your question - we shouldn't update the metric on 
failed attempts. 

> Metric metadata-age is reset on a failed update
> ---
>
> Key: KAFKA-2101
> URL: https://issues.apache.org/jira/browse/KAFKA-2101
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tim Brooks
>
> In org.apache.kafka.clients.Metadata there is a lastUpdate() method that 
> returns the time the metadata was lasted updated. This is only called by 
> metadata-age metric. 
> However the lastRefreshMs is updated on a failed update (when 
> MetadataResponse has not valid nodes). This is confusing since the metric's 
> name suggests that it is a true reflection of the age of the current 
> metadata. But the age might be reset by a failed update. 
> Additionally, lastRefreshMs is not reset on a failed update due to no node 
> being available. This seems slightly inconsistent, since one failure 
> condition resets the metrics, but another one does not. Especially since both 
> failure conditions do trigger the backoff (for the next attempt).
> I have not implemented a patch yet, because I am unsure what expected 
> behavior is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2100) Client Error doesn't preserve or display original server error code when it is an unknown code

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2100:
-
Labels: newbie  (was: )

> Client Error doesn't preserve or display original server error code when it 
> is an unknown code
> --
>
> Key: KAFKA-2100
> URL: https://issues.apache.org/jira/browse/KAFKA-2100
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: newbie
>
> When the java client receives an unfamiliar error code, it translates it into 
> UNKNOWN(-1, new UnknownServerException("The server experienced an unexpected 
> error when processing the request"))
> This completely loses the original code, which makes troubleshooting from the 
> client impossible. 
> Will be better to preserve the original code and write it to the log when 
> logging the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-04-26 Thread Jun Rao
Andrii,

Another thing. We decided not to add the lag info in TMR. To be consistent,
we probably also want to remove ISR from TMR since only the leader knows
it. We can punt on adding any new request from getting ISR. ISR is mostly
useful for monitoring. Currently, one can determine if a replica is in ISR
from the lag metrics (a replica is in ISR if its lag is <=0).

Thanks,

Jun

On Sun, Apr 26, 2015 at 4:31 PM, Andrii Biletskyi <
andrii.bilets...@stealth.ly> wrote:

> Jun,
>
> I like your approach to AlterTopicReques semantics! Sounds like
> we linearize all request fields to ReplicaAssignment - I will definitely
> try this out to ensure there are no other pitfalls.
>
> With regards to multiple instructions in one batch per topic. For me
> this sounds reasonable too. We discussed last time that it's pretty
> strange we give users schema that supports batching and at the
> same time introduce restrictions to the way batching can be used
> (in this case - only one instruction per topic). But now, when we give
> users everything they need to avoid such misleading use cases (if
> we implement the previous item - user will be able to specify/change
> all fields in one instruction) - it might be a good justification to
> prohibit
> serving such requests.
>
> Any objections?
>
> Thanks,
> Andrii BIletskyi
>
>
>
> On Sun, Apr 26, 2015 at 11:00 PM, Jun Rao  wrote:
>
> > Andrii,
> >
> > Thanks for the update.
> >
> > For your second point, I agree that if a single AlterTopicRequest can
> make
> > multiple changes, there is no need to support the same topic included
> more
> > than once in the request.
> >
> > Now about the semantics in your first question. I was thinking that we
> can
> > do the following.
> > a. If ReplicaAssignment is specified, we expect that this will specify
> the
> > replica assignment for all partitions in the topic. For now, we can have
> > the constraint that there could be more partitions than existing ones,
> but
> > can't be less. In this case, both partitions and replicas are ignored.
> Then
> > for each partition, we do one of the followings.
> > a1. If the partition doesn't exist, add the partition with the replica
> > assignment directly to the topic path in ZK.
> > a2. If the partition exists and the new replica assignment is not the
> same
> > as the existing one, include it in the reassign partition json. If the
> json
> > is not empty, write it to the reassignment path in ZK to trigger
> partition
> > reassignment.
> > b. Otherwise, if replicas is specified, generate new ReplicaAssignment
> for
> > existing partitions. If partitions is specified (assuming it's larger),
> > generate ReplicaAssignment for the new partitions as well. Then go back
> to
> > step a to make a decision.
> > c. Otherwise, if only partitions is specified, add assignments of
> existing
> > partitions to ReplicaAssignment. Generate assignments to the new
> partitions
> > and add them to ReplicaAssignment. Then go back to step a to make a
> > decision.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Sat, Apr 25, 2015 at 7:21 AM, Andrii Biletskyi <
> > andrii.bilets...@stealth.ly> wrote:
> >
> > > Guys,
> > >
> > > Can we come to some agreement in terms of the second item from
> > > the email above? This blocks me from updating and uploading the
> > > patch. Also the new schedule for the weekly calls doesn't work very
> > > well for me - it's 1 am in my timezone :) - so I'd rather we confirm
> > > everything that is possible by email.
> > >
> > > Thanks,
> > > Andrii Biletskyi
> > >
> > > On Wed, Apr 22, 2015 at 5:50 PM, Andrii Biletskyi <
> > > andrii.bilets...@stealth.ly> wrote:
> > >
> > > > As said above, I spent some time thinking about AlterTopicRequest
> > > > semantics and batching.
> > > >
> > > > Firstly, about AlterTopicRequest. Our goal here is to see whether we
> > > > can suggest some simple semantics and at the same time let users
> > > > change different things in one instruction (hereinafter instruction -
> > is
> > > > one of the entries in batch request).
> > > > We can resolve arguments according to this schema:
> > > > 1) If ReplicaAsignment is specified:
> > > > it's a reassign partitions request
> > > > 2) If either Partitions or ReplicationFactor is specified:
> > > >a) If Partitions specified - this is increase partitions case
> > > >b) If ReplicationFactor is specified - this means we need to
> > > > automatically
> > > >regenerate replica assignment and treat it as reassign partitions
> > > > request
> > > > Note: this algorithm is a bit inconsistent with the
> CreateTopicRequest
> > -
> > > > with
> > > > ReplicaAssignment specified there user can implicitly define
> Partitions
> > > > and
> > > > ReplicationFactor, in AlterTopicRequest those are completely
> different
> > > > things,
> > > > i.e. you can't include new partitions to the ReplicaAssignment to
> > > > implicitly ask
> > > > controller to increase partitions - controller will simply return
> > > > Inval

[jira] [Commented] (KAFKA-1621) Standardize --messages option in perf scripts

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513324#comment-14513324
 ] 

Neha Narkhede commented on KAFKA-1621:
--

[~rekhajoshm] Sorry for the delay, reviewed your PR. However, could you send it 
for trunk instead of 0.8.2?

> Standardize --messages option in perf scripts
> -
>
> Key: KAFKA-1621
> URL: https://issues.apache.org/jira/browse/KAFKA-1621
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Jay Kreps
>  Labels: newbie
>
> This option is specified in PerfConfig and is used by the producer, consumer 
> and simple consumer perf commands. The docstring on the argument does not 
> list it as required but the producer performance test requires it--others 
> don't.
> We should standardize this so that either all the commands require the option 
> and it is marked as required in the docstring or none of them list it as 
> required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1904) run sanity failed test

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1904:
-
Component/s: system tests

> run sanity failed test
> --
>
> Key: KAFKA-1904
> URL: https://issues.apache.org/jira/browse/KAFKA-1904
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Joe Stein
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: run_sanity.log.gz
>
>
> _test_case_name  :  testcase_1
> _test_class_name  :  ReplicaBasicTest
> arg : bounce_broker  :  true
> arg : broker_type  :  leader
> arg : message_producing_free_time_sec  :  15
> arg : num_iteration  :  2
> arg : num_messages_to_produce_per_producer_call  :  50
> arg : num_partition  :  2
> arg : replica_factor  :  3
> arg : sleep_seconds_between_producer_calls  :  1
> validation_status  : 
>  Test completed  :  FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2080) quick cleanup of producer performance scripts

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2080:
-
Labels: newbie  (was: )

> quick cleanup of producer performance scripts
> -
>
> Key: KAFKA-2080
> URL: https://issues.apache.org/jira/browse/KAFKA-2080
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: newbie
>
> We have two producer performance tools at the moment: one at 
> o.a.k.client.tools and one at kafka.tools
> bin/kafka-producer-perf-test.sh is calling the kafka.tools one.
> org.apache.kafka.clients.tools.ProducerPerformance has --messages listed as 
> optional (with default) while leaving the parameter out results in an error.
> Cleanup will include:
> * Removing the kafka.tools performance tool
> * Changing the shellscript to use new tool
> * Fix the misleading documentation for --messages
> * Adding both performance tools to the kafka docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2080) quick cleanup of producer performance scripts

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2080:
-
Component/s: tools

> quick cleanup of producer performance scripts
> -
>
> Key: KAFKA-2080
> URL: https://issues.apache.org/jira/browse/KAFKA-2080
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: newbie
>
> We have two producer performance tools at the moment: one at 
> o.a.k.client.tools and one at kafka.tools
> bin/kafka-producer-perf-test.sh is calling the kafka.tools one.
> org.apache.kafka.clients.tools.ProducerPerformance has --messages listed as 
> optional (with default) while leaving the parameter out results in an error.
> Cleanup will include:
> * Removing the kafka.tools performance tool
> * Changing the shellscript to use new tool
> * Fix the misleading documentation for --messages
> * Adding both performance tools to the kafka docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2041:
-
Status: Patch Available  (was: Open)

> Add ability to specify a KeyClass for KafkaLog4jAppender
> 
>
> Key: KAFKA-2041
> URL: https://issues.apache.org/jira/browse/KAFKA-2041
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Benoy Antony
>Assignee: Jun Rao
> Attachments: KAFKA-2041.patch, kafka-2041-001.patch, 
> kafka-2041-002.patch, kafka-2041-003.patch
>
>
> KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 
> Since there is no key or explicit partition number, the messages are sent to 
> random partitions. 
> In some cases, it is possible to derive a key from the message itself. 
> So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which 
> will provide a key for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1858) Make ServerShutdownTest a bit less flaky

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1858:
-
Reviewer: Jun Rao
  Status: Patch Available  (was: Open)

> Make ServerShutdownTest a bit less flaky
> 
>
> Key: KAFKA-1858
> URL: https://issues.apache.org/jira/browse/KAFKA-1858
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
> Attachments: KAFKA-1858.patch
>
>
> ServerShutdownTest currently:
> * Starts a KafkaServer
> * Does stuff
> * Stops the server
> * Counts if there are any live kafka threads
> This is fine on its own. But when running in a test suite (i.e gradle test), 
> the test is very very sensitive to any other test freeing all resources. If 
> you start a server in a previous test and forgot to close it, the 
> ServerShutdownTest will find threads from the previous test and fail.
> This makes for a flaky test that is pretty challenging to troubleshoot.
> I suggest counting the threads at the beginning and end of each test in the 
> class, and only failing if the number at the end is greater than the number 
> at the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2014) Chaos Monkey / Failure Inducer for Kafka

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2014:
-
Component/s: system tests

> Chaos Monkey / Failure Inducer for Kafka
> 
>
> Key: KAFKA-2014
> URL: https://issues.apache.org/jira/browse/KAFKA-2014
> Project: Kafka
>  Issue Type: Task
>  Components: system tests
>Reporter: Mayuresh Gharat
>Assignee: Mayuresh Gharat
>
> Implement a Chaos Monkey for kafka, that will help us catch any shortcomings 
> in the test environment before going to production. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2003) Add upgrade tests

2015-04-26 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-2003:
-
Component/s: system tests

> Add upgrade tests
> -
>
> Key: KAFKA-2003
> URL: https://issues.apache.org/jira/browse/KAFKA-2003
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
>
> To test protocol changes, compatibility and upgrade process, we need a good 
> way to test different versions of the product together and to test end-to-end 
> upgrade process.
> For example, for 0.8.2 to 0.8.3 test we want to check:
> * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
> * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
> * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
> There are probably more questions. But an automated framework that can test 
> those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2118) Cleaner cannot clean after shutdown during replaceSegments

2015-04-26 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-2118:
---
   Resolution: Fixed
Fix Version/s: 0.8.3
   Status: Resolved  (was: Patch Available)

Thanks for the latest patch. +1 and committed to trunk.

> Cleaner cannot clean after shutdown during replaceSegments
> --
>
> Key: KAFKA-2118
> URL: https://issues.apache.org/jira/browse/KAFKA-2118
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Gian Merlino
>Assignee: Rajini Sivaram
> Fix For: 0.8.3
>
> Attachments: KAFKA-2118.patch, KAFKA-2118_2015-04-15_09:43:51.patch, 
> KAFKA-2118_2015-04-19_19:02:38.patch
>
>
> If a broker shuts down after the cleaner calls replaceSegments with more than 
> one segment, the partition can be left in an uncleanable state. We saw this 
> on a few brokers after doing a rolling update. The sequence of things we saw 
> is:
> 1) Cleaner cleaned segments with base offsets 0, 1094621529, and 1094831997 
> into a new segment 0.
> 2) Cleaner logged "Swapping in cleaned segment 0 for segment(s) 
> 0,1094621529,1094831997 in log xxx-15." and called replaceSegments.
> 3) 0.cleaned was renamed to 0.swap.
> 4) Broker shut down before deleting segments 1094621529 and 1094831997.
> 5) Broker started up and logged "Found log file 
> /mnt/persistent/kafka-logs/xxx-15/.log.swap from 
> interrupted swap operation, repairing."
> 6) Cleaner thread died with the exception 
> "kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1094911424) to position 1003 no larger than the last offset appended 
> (1095045873) to 
> /mnt/persistent/kafka-logs/xxx-15/.index.cleaned."
> I think what's happening in #6 is that when the broker started back up and 
> repaired the log, segment 0 ended up with a bunch of messages that were also 
> in segment 1094621529 and 1094831997 (because the new segment 0 was created 
> from cleaning all 3). But segments 1094621529 and 1094831997 were still on 
> disk, so offsets on disk were no longer monotonically increasing, violating 
> the assumption of OffsetIndex. We ended up fixing this by deleting segments 
> 1094621529 and 1094831997 manually, and then removing the line for this 
> partition from the cleaner-offset-checkpoint file (otherwise it would 
> reference the non-existent segment 1094621529).
> This can happen even on a clean shutdown (the async deletes in 
> replaceSegments might not happen).
> Cleaner logs post-startup:
> 2015-04-12 15:07:56,533 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Beginning cleaning of log xxx-15.
> 2015-04-12 15:07:56,533 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Building offset map for xxx-15...
> 2015-04-12 15:07:56,595 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Building offset map for log xxx-15 for 6 
> segments in offset range [1094621529, 1095924157).
> 2015-04-12 15:08:01,443 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Offset map for log xxx-15 complete.
> 2015-04-12 15:08:01,443 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Cleaning log xxx-15 (discarding tombstones 
> prior to Sun Apr 12 14:05:37 UTC 2015)...
> 2015-04-12 15:08:01,443 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Cleaning segment 0 in log xxx-15 (last 
> modified Sun Apr 12 14:05:38 UTC 2015) into 0, retaining deletes.
> 2015-04-12 15:08:04,283 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Cleaning segment 1094621529 in log xxx-15 
> (last modified Sun Apr 12 13:49:27 UTC 2015) into 0, discarding deletes.
> 2015-04-12 15:08:05,079 INFO [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - Cleaner 0: Cleaning segment 1094831997 in log xxx-15 
> (last modified Sun Apr 12 14:04:28 UTC 2015) into 0, discarding deletes.
> 2015-04-12 15:08:05,157 ERROR [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to
> kafka.common.InvalidOffsetException: Attempt to append an offset (1094911424) 
> to position 1003 no larger than the last offset appended (1095045873) to 
> /mnt/persistent/kafka-logs/xxx-15/.index.
> cleaned.
> at kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
> at kafka.log.LogSegment.append(LogSegment.scala:81)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:427)
> at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply

Re: Review Request 32937: Patch for KAFKA-2102

2015-04-26 Thread Tim Brooks

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32937/
---

(Updated April 27, 2015, 12:26 a.m.)


Review request for kafka.


Bugs: KAFKA-2102
https://issues.apache.org/jira/browse/KAFKA-2102


Repository: kafka


Description (updated)
---

Method does not need to be synchronized


Do not synchronize contains topic method


Continue removing the need to synchronize the metadata object


Store both last refresh and need to refresh in same variable


Fix synchronize issue


Version needs to be volatile


rework how signally happens


remove unnecessary creation of new set


initialize 0 at the field level


Fix the build


Start moving synchronization of metadata to different class


Start moving synchronization work to new class


Remove unused code


Functionality works. Not threadsafe


move version into metadata synchronizer


Make version volatile


Rename classes


move to finergrained locking


Use locks in bookkeeper


Only use atomic variabled


use successful metadata in metrics


Change these things back to trunk


Address issues with patch


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/Metadata.java 
07f1cdb1fe920b0c7a5f2d101ddc40c689e1b247 
  clients/src/main/java/org/apache/kafka/clients/MetadataBookkeeper.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
b7ae595f2cc46e5dfe728bc3ce6082e9cd0b6d36 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
42b12928781463b56fc4a45d96bb4da2745b6d95 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
b2db91ca14bbd17fef5ce85839679144fff3f689 

Diff: https://reviews.apache.org/r/32937/diff/


Testing
---


Thanks,

Tim Brooks



[jira] [Commented] (KAFKA-2102) Remove unnecessary synchronization when managing metadata

2015-04-26 Thread Tim Brooks (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513346#comment-14513346
 ] 

Tim Brooks commented on KAFKA-2102:
---

Updated reviewboard https://reviews.apache.org/r/32937/diff/
 against branch origin/trunk

> Remove unnecessary synchronization when managing metadata
> -
>
> Key: KAFKA-2102
> URL: https://issues.apache.org/jira/browse/KAFKA-2102
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Tim Brooks
>Assignee: Tim Brooks
> Attachments: KAFKA-2102.patch, KAFKA-2102_2015-04-08_00:20:33.patch, 
> KAFKA-2102_2015-04-15_19:55:45.patch, KAFKA-2102_2015-04-26_17:25:21.patch, 
> eight-threads-patch.txt, eight-threads-trunk.txt, five-threads-patch.txt, 
> five-threads-trunk.txt
>
>
> Usage of the org.apache.kafka.clients.Metadata class is synchronized. It 
> seems like the current functionality could be maintained without 
> synchronizing the whole class.
> I have been working on improving this by moving to finer grained locks and 
> using atomic operations. My initial benchmarking of the producer is that this 
> will improve latency (using HDRHistogram) on submitting messages.
> I have produced an initial patch. I do not necessarily believe this is 
> complete. And I want to definitely produce some more benchmarks. However, I 
> wanted to get early feedback because this change could be deceptively tricky.
> I am interested in knowing if this is:
> 1. Something that is of interest to the maintainers/community.
> 2. Along the right track
> 3. If there are any gotchas that make my current approach naive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2102) Remove unnecessary synchronization when managing metadata

2015-04-26 Thread Tim Brooks (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Brooks updated KAFKA-2102:
--
Attachment: KAFKA-2102_2015-04-26_17:25:21.patch

> Remove unnecessary synchronization when managing metadata
> -
>
> Key: KAFKA-2102
> URL: https://issues.apache.org/jira/browse/KAFKA-2102
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Tim Brooks
>Assignee: Tim Brooks
> Attachments: KAFKA-2102.patch, KAFKA-2102_2015-04-08_00:20:33.patch, 
> KAFKA-2102_2015-04-15_19:55:45.patch, KAFKA-2102_2015-04-26_17:25:21.patch, 
> eight-threads-patch.txt, eight-threads-trunk.txt, five-threads-patch.txt, 
> five-threads-trunk.txt
>
>
> Usage of the org.apache.kafka.clients.Metadata class is synchronized. It 
> seems like the current functionality could be maintained without 
> synchronizing the whole class.
> I have been working on improving this by moving to finer grained locks and 
> using atomic operations. My initial benchmarking of the producer is that this 
> will improve latency (using HDRHistogram) on submitting messages.
> I have produced an initial patch. I do not necessarily believe this is 
> complete. And I want to definitely produce some more benchmarks. However, I 
> wanted to get early feedback because this change could be deceptively tricky.
> I am interested in knowing if this is:
> 1. Something that is of interest to the maintainers/community.
> 2. Along the right track
> 3. If there are any gotchas that make my current approach naive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2140) Improve code readability

2015-04-26 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513374#comment-14513374
 ] 

Ewen Cheslack-Postava commented on KAFKA-2140:
--

Not sure if it's changed at all, but this blog post suggests we need to file a 
ticket with the infrastructure team: 
https://blogs.apache.org/infra/entry/improved_integration_between_apache_and 
(It also describes some of the improvements/integrations they made, which 
people might want to know so they understand the workflow before we transition 
over.)

> Improve code readability
> 
>
> Key: KAFKA-2140
> URL: https://issues.apache.org/jira/browse/KAFKA-2140
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Minor
> Attachments: KAFKA-2140-fix.patch, KAFKA-2140.patch
>
>
> There are a number of places where code could be written in a more readable 
> and idiomatic form. It's easier to explain with a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #477

2015-04-26 Thread Apache Jenkins Server
See 

Changes:

[junrao] kafka-2118; Cleaner cannot clean after shutdown during 
replaceSegments; patched by Rajini Sivaram; reviewed by Jun Rao

--
[...truncated 667 lines...]
kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogSegmentTest > testReadOnEmptySegment PASSED

kafka.log.LogSegmentTest > testReadBeforeFirstOffset PASSED

kafka.log.LogSegmentTest > testMaxOffset PASSED

kafka.log.LogSegmentTest > testReadAfterLast PASSED

kafka.log.LogSegmentTest > testReadFromGap PASSED

kafka.log.LogSegmentTest > testTruncate PASSED

kafka.log.LogSegmentTest > testTruncateFull PASSED

kafka.log.LogSegmentTest > testNextOffsetCalculation PASSED

kafka.log.LogSegmentTest > testChangeFileSuffixes PASSED

kafka.log.LogSegmentTest > testRecoveryFixesCorruptIndex PASSED

kafka.log.LogSegmentTest > testRecoveryWithCorruptMessage PASSED

kafka.log.CleanerTest > testCleanSegments PASSED

kafka.log.CleanerTest > testCleaningWithDeletes PASSED

kafka.log.CleanerTest > testCleaningWithUnkeyedMessages PASSED

kafka.log.CleanerTest > testCleanSegmentsWithAbort PASSED

kafka.log.CleanerTest > testSegmentGrouping PASSED

kafka.log.CleanerTest > testSegmentGroupingWithSparseOffsets PASSED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.CleanerTest > testRecoveryAfterCrash PASSED

kafka.log.FileMessageSetTest > testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest > testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest > testSizeInBytes PASSED

kafka.log.FileMessageSetTest > testWriteTo PASSED

kafka.log.FileMessageSetTest > testTruncate PASSED

kafka.log.FileMessageSetTest > testFileSize PASSED

kafka.log.FileMessageSetTest > testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest > testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest > testRead PASSED

kafka.log.FileMessageSetTest > testSearch PASSED

kafka.log.FileMessageSetTest > testIteratorWithLimits PASSED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.OffsetIndexTest > truncate PASSED

kafka.log.OffsetIndexTest > randomLookupTest PASSED

kafka.log.OffsetIndexTest > lookupExtremeCases PASSED

kafka.log.OffsetIndexTest > appendTooMany PASSED

kafka.log.OffsetIndexTest > appendOutOfOrder PASSED

kafka.log.OffsetIndexTest > testReopen PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailing

[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-04-26 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513387#comment-14513387
 ] 

Ashish K Singh commented on KAFKA-1367:
---

[~nehanarkhede] If no one has already started working on this I would like to 
take a stab at it.

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2128) kafka.Kafka should return non-zero exit code when caught exception.

2015-04-26 Thread Sasaki Toru (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513408#comment-14513408
 ] 

Sasaki Toru commented on KAFKA-2128:


Thanks [~gwenshap]  and [~aauradkar] for reviews.

> kafka.Kafka should return non-zero exit code when caught exception.
> ---
>
> Key: KAFKA-2128
> URL: https://issues.apache.org/jira/browse/KAFKA-2128
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.3
>Reporter: Sasaki Toru
>Priority: Minor
> Fix For: 0.8.3
>
> Attachments: KAFKA-2128-1.patch
>
>
> kafka.Kafka Object always return exit code zero.
> I think that it should return non-zero exit code when caught exception.
> (for example FileNotFoundException caught, since server.properies is not 
> exist)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1595) Remove deprecated and slower scala JSON parser from kafka.consumer.TopicCount

2015-04-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513420#comment-14513420
 ] 

Jun Rao commented on KAFKA-1595:


[~ijuma], we discussed the json library in the context of adding an admin 
client. The admin client will be written in java and probably needs some json 
parsing (this will be clearer as we are developing KIP-4). So, we need a java 
json libary and decided to use FastXML. FastXML probably has more dependencies. 
However, given that it's widely used and has been there for some time. I assume 
that it has to maintain api compatibility? If we do have FastXML in the client, 
perhaps it will be better to reuse that on the broker side for consistency?

As for the old clients, the earliest that we can deprecate it is probably one 
release after the new consumer is released.

> Remove deprecated and slower scala JSON parser from kafka.consumer.TopicCount
> -
>
> Key: KAFKA-1595
> URL: https://issues.apache.org/jira/browse/KAFKA-1595
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Jagbir
>Assignee: Ismael Juma
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1595.patch
>
>
> The following issue is created as a follow up suggested by Jun Rao
> in a kafka news group message with the Subject
> "Blocking Recursive parsing from 
> kafka.consumer.TopicCount$.constructTopicCount"
> SUMMARY:
> An issue was detected in a typical cluster of 3 kafka instances backed
> by 3 zookeeper instances (kafka version 0.8.1.1, scala version 2.10.3,
> java version 1.7.0_65). On consumer end, when consumers get recycled,
> there is a troubling JSON parsing recursion which takes a busy lock and
> blocks consumers thread pool.
> In 0.8.1.1 scala client library ZookeeperConsumerConnector.scala:355 takes
> a global lock (0xd3a7e1d0) during the rebalance, and fires an
> expensive JSON parsing, while keeping the other consumers from shutting
> down, see, e.g,
> at 
> kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:161)
> The deep recursive JSON parsing should be deprecated in favor
> of a better JSON parser, see, e.g,
> http://engineering.ooyala.com/blog/comparing-scala-json-libraries?
> DETAILS:
> The first dump is for a recursive blocking thread holding the lock for 
> 0xd3a7e1d0
> and the subsequent dump is for a waiting thread.
> (Please grep for 0xd3a7e1d0 to see the locked object.)
> Â 
> -8<-
> "Sa863f22b1e5hjh6788991800900b34545c_profile-a-prod1-s-140789080845312-c397945e8_watcher_executor"
> prio=10 tid=0x7f24dc285800 nid=0xda9 runnable [0x7f249e40b000]
> java.lang.Thread.State: RUNNABLE
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.p$7(Parsers.scala:722)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.continue$1(Parsers.scala:726)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.apply(Parsers.scala:737)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.apply(Parsers.scala:721)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:142)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.a

[jira] [Commented] (KAFKA-1595) Remove deprecated and slower scala JSON parser from kafka.consumer.TopicCount

2015-04-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513441#comment-14513441
 ] 

Gwen Shapira commented on KAFKA-1595:
-

I'm a bit concerned about choosing a library *because* it is not popular... the 
popular ones get more bug fixes :)

I totally agree we need to choose a library for the server and that it doesn't 
need to be the same one as the client. I think the right place for the 
choosing-a-json-library discussion is the dev mailing list. 

Re: API compatibility: As usual, FastXML is compatible within major releases, 
but if there are two completely different versions, it will break (same as 
Codahale Metrics)

> Remove deprecated and slower scala JSON parser from kafka.consumer.TopicCount
> -
>
> Key: KAFKA-1595
> URL: https://issues.apache.org/jira/browse/KAFKA-1595
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.1.1
>Reporter: Jagbir
>Assignee: Ismael Juma
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1595.patch
>
>
> The following issue is created as a follow up suggested by Jun Rao
> in a kafka news group message with the Subject
> "Blocking Recursive parsing from 
> kafka.consumer.TopicCount$.constructTopicCount"
> SUMMARY:
> An issue was detected in a typical cluster of 3 kafka instances backed
> by 3 zookeeper instances (kafka version 0.8.1.1, scala version 2.10.3,
> java version 1.7.0_65). On consumer end, when consumers get recycled,
> there is a troubling JSON parsing recursion which takes a busy lock and
> blocks consumers thread pool.
> In 0.8.1.1 scala client library ZookeeperConsumerConnector.scala:355 takes
> a global lock (0xd3a7e1d0) during the rebalance, and fires an
> expensive JSON parsing, while keeping the other consumers from shutting
> down, see, e.g,
> at 
> kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:161)
> The deep recursive JSON parsing should be deprecated in favor
> of a better JSON parser, see, e.g,
> http://engineering.ooyala.com/blog/comparing-scala-json-libraries?
> DETAILS:
> The first dump is for a recursive blocking thread holding the lock for 
> 0xd3a7e1d0
> and the subsequent dump is for a waiting thread.
> (Please grep for 0xd3a7e1d0 to see the locked object.)
> Â 
> -8<-
> "Sa863f22b1e5hjh6788991800900b34545c_profile-a-prod1-s-140789080845312-c397945e8_watcher_executor"
> prio=10 tid=0x7f24dc285800 nid=0xda9 runnable [0x7f249e40b000]
> java.lang.Thread.State: RUNNABLE
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.p$7(Parsers.scala:722)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.continue$1(Parsers.scala:726)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.apply(Parsers.scala:737)
> at 
> scala.util.parsing.combinator.Parsers$$anonfun$rep1$1.apply(Parsers.scala:721)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:142)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:239)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$a

[jira] [Created] (KAFKA-2151) make MockMetricsReporter a little more generic

2015-04-26 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created KAFKA-2151:
-

 Summary: make MockMetricsReporter a little more generic
 Key: KAFKA-2151
 URL: https://issues.apache.org/jira/browse/KAFKA-2151
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Steven Zhen Wu
Assignee: Steven Zhen Wu


this is a follow-up improvement on the test code related KAFKA-2121. since we 
moved MockMetricsReporter into a generic/public location, it's better to make 
it a little more generic. updated KafkaProducerTest and KafkaConsumerTest 
accordingly.

[~ewencp] since you are familiar with the KAFKA-2121. will ask you as reviewer. 
should be an easy one :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2151) make MockMetricsReporter a little more generic

2015-04-26 Thread Steven Zhen Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513448#comment-14513448
 ] 

Steven Zhen Wu commented on KAFKA-2151:
---

Created reviewboard https://reviews.apache.org/r/33574/diff/
 against branch apache/trunk

> make MockMetricsReporter a little more generic
> --
>
> Key: KAFKA-2151
> URL: https://issues.apache.org/jira/browse/KAFKA-2151
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Steven Zhen Wu
>Assignee: Steven Zhen Wu
> Attachments: KAFKA-2151.patch
>
>
> this is a follow-up improvement on the test code related KAFKA-2121. since we 
> moved MockMetricsReporter into a generic/public location, it's better to make 
> it a little more generic. updated KafkaProducerTest and KafkaConsumerTest 
> accordingly.
> [~ewencp] since you are familiar with the KAFKA-2121. will ask you as 
> reviewer. should be an easy one :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2151) make MockMetricsReporter a little more generic

2015-04-26 Thread Steven Zhen Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Zhen Wu updated KAFKA-2151:
--
Attachment: KAFKA-2151.patch

> make MockMetricsReporter a little more generic
> --
>
> Key: KAFKA-2151
> URL: https://issues.apache.org/jira/browse/KAFKA-2151
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Steven Zhen Wu
>Assignee: Steven Zhen Wu
> Attachments: KAFKA-2151.patch
>
>
> this is a follow-up improvement on the test code related KAFKA-2121. since we 
> moved MockMetricsReporter into a generic/public location, it's better to make 
> it a little more generic. updated KafkaProducerTest and KafkaConsumerTest 
> accordingly.
> [~ewencp] since you are familiar with the KAFKA-2121. will ask you as 
> reviewer. should be an easy one :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 33574: Patch for KAFKA-2151

2015-04-26 Thread Steven Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33574/
---

Review request for kafka.


Bugs: KAFKA-2151
https://issues.apache.org/jira/browse/KAFKA-2151


Repository: kafka


Description
---

make MockMetricsReporter a little more generic


Diffs
-

  
clients/src/test/java/org/apache/kafka/clients/consumer/KafkaConsumerTest.java 
eea2c28450736d1668c68828f77a49470a82c3d0 
  
clients/src/test/java/org/apache/kafka/clients/producer/KafkaProducerTest.java 
49f1427bcbe43c773920a25aa69a71d0329296b7 
  clients/src/test/java/org/apache/kafka/test/MockMetricsReporter.java 
6f948f240c906029a0f972bf770f288f390ea714 

Diff: https://reviews.apache.org/r/33574/diff/


Testing
---


Thanks,

Steven Wu



[jira] [Updated] (KAFKA-2151) make MockMetricsReporter a little more generic

2015-04-26 Thread Steven Zhen Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Zhen Wu updated KAFKA-2151:
--
Status: Patch Available  (was: Open)

> make MockMetricsReporter a little more generic
> --
>
> Key: KAFKA-2151
> URL: https://issues.apache.org/jira/browse/KAFKA-2151
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Steven Zhen Wu
>Assignee: Steven Zhen Wu
> Attachments: KAFKA-2151.patch
>
>
> this is a follow-up improvement on the test code related KAFKA-2121. since we 
> moved MockMetricsReporter into a generic/public location, it's better to make 
> it a little more generic. updated KafkaProducerTest and KafkaConsumerTest 
> accordingly.
> [~ewencp] since you are familiar with the KAFKA-2121. will ask you as 
> reviewer. should be an easy one :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2139) Add a separate controller messge queue with higher priority on broker side

2015-04-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513453#comment-14513453
 ] 

Jiangjie Qin commented on KAFKA-2139:
-

Hi [~jkreps][~nehanarkhede][~junrao][~jjkoshy][~guozhang], I just drafted the 
wiki page to put what I'm thinking about controller rewriting here.
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Redesign
It is still sketchy and needs a lot more details. I will add them later. But it 
will be great if you can take a look first in case I am missing something 
important (which I probably am...)

> Add a separate controller messge queue with higher priority on broker side 
> ---
>
> Key: KAFKA-2139
> URL: https://issues.apache.org/jira/browse/KAFKA-2139
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>
> This ticket is supposed to be working together with KAFKA-2029. 
> There are two issues with current controller to broker messages.
> 1. On the controller side the message are sent without synchronization.
> 2. On broker side the controller messages share the same queue as client 
> messages.
> The problem here is that brokers process the controller messages for the same 
> partition at different times and the variation could be big. This causes 
> unnecessary data loss and prolong the preferred leader election / controlled 
> shutdown/ partition reassignment, etc.
> KAFKA-2029 was trying to add a boundary between messages for different 
> partitions. For example, before leader migration for previous partition 
> finishes, the leader migration for next partition won't begin.
> This ticket is trying to let broker process controller messages faster. So 
> the idea is have separate queue to hold controller messages, if there are 
> controller messages, KafkaApi thread will first take care of those messages, 
> otherwise it will proceed messages from clients.
> Those two tickets are not ultimate solution to current controller problems, 
> but just mitigate them with minor code changes. Moving forward, we still need 
> to think about rewriting controller in a cleaner way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [VOTE] KIP-11- Authorization design for kafka security

2015-04-26 Thread Sun, Dapeng
Hi Parth

The design looks good, a few minor comments below. Since I just started looking 
into the discussion and many previous discussions I may missed, I'm sorry if 
these comments had be discussed.

1. About SimpleAclAuthorizer (SimpleAuthorizer):
a. As my understanding, I think there should only one type 
privilege(allow/deny) of a topic on a principle, or we make it deny > allow.
For example, acl_1 " host1 -> group1-> user1 -> read->allow" and acl_2 " 
host1-> group1 -> user1 ->read->deny", if the two acls are for a same topic, it 
may be hard to understand, do you think it's necessary to add some details 
about this to wiki.
b. And when we do authorize a user on a topic, we may should check user's user 
level acl first, then check user's group level acl, finally we check the host 
level and default level acl. do you think it's necessary we add some contents 
like these to wiki.
For example, "host1 -> group1-> user1"  >  "host1 -> group1"  >  "host1"

2.About SimpleAclAuthorizer (Acl Json will be stored in zookeeper)
a. It may be better to make acl json stored hierarchily. It may be easy to 
search and do authorize. For example, when we authorize a user, we only need 
user related acls.
b. I found one acl may contains multi-principles, multi-operations and 
multi-hosts, I'm strongly agreed with we provide api like these, but the acls 
stored in zookeeper or memory we may better to separate to one-principle, 
one-operation and one host. So we could make sure there are not many acls with 
same meaning and make acl management easily.


Regards
Dapeng

-Original Message-
From: Jun Rao [mailto:j...@confluent.io] 
Sent: Monday, April 27, 2015 5:02 AM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-11- Authorization design for kafka security

A few more minor comments.

100. To make it clear, perhaps we should rename the resource "group" to 
consumer-group. We can probably make the same change in CLI as well so that 
it's not confused with user group.

101. Currently, create is only at the cluster level. Should it also be at topic 
level? For example, perhaps it's useful to allow only user X to create topic X.

Thanks,

Jun


On Sun, Apr 26, 2015 at 12:36 AM, Gwen Shapira 
wrote:

> Thanks for clarifying, Parth. I think you are taking the right 
> approach here.
>
> On Fri, Apr 24, 2015 at 11:46 AM, Parth Brahmbhatt 
>  wrote:
> > Sorry Gwen, completely misunderstood the question :-).
> >
> > * Does everyone have the privilege to create a new Group and use it 
> > to consume from Topics he's already privileged on?
> > Yes in current proposal. I did not see an API to create 
> > group
> but if you
> > have a READ permission on a TOPIC and WRITE permission on that Group 
> > you are free to join and consume.
> >
> >
> > * Will the CLI tool be used to manage group membership too?
> > Yes and I think that means I need to add ―group. Updating 
> > the
> KIP. Thanks
> > for pointing this out.
> >
> > * Groups are kind of ephemeral, right? If all consumers in the group 
> > disconnect the group is gone, AFAIK. Do we preserve the ACLs? Or do 
> > we treat the new group as completely new resource? Can we create 
> > ACLs before the group exists, in anticipation of it getting created?
> > I have considered any auto delete and auto create as out of
> scope for the
> > first release. So Right now I was going with preserving the acls. Do 
> > you see any issues with this? Auto deleting would mean authorizer 
> > will now have to get into implementation details of kafka which I 
> > was trying to avoid.
> >
> > Thanks
> > Parth
> >
> > On 4/24/15, 11:33 AM, "Gwen Shapira"  wrote:
> >
> >>We are not talking about same Groups :)
> >>
> >>I meant, Groups of consumers (which KIP-11 lists as a separate 
> >>resource in the Privilege table)
> >>
> >>On Fri, Apr 24, 2015 at 11:31 AM, Parth Brahmbhatt 
> >> wrote:
> >>> I see Groups as something we can add incrementally in the current
> model.
> >>> The acls take principalType: name so groups can be represented as
> group:
> >>> groupName. We are not managing group memberships anywhere in kafka 
> >>> and
> I
> >>> don’t see the need to do so.
> >>>
> >>> So for a topic1 using the CLI an admin can add an acl to grant 
> >>> access
> to
> >>> group:kafka-test-users.
> >>>
> >>> The authorizer implementation can have a plugin to map 
> >>>authenticated user  to groups ( This is how hadoop and storm 
> >>>works). The plugin could be  mapping user to linux/ldap/active 
> >>>directory groups but that is again upto  the implementation.
> >>>
> >>> What we are offering is an interface that is extensible so these 
> >>>features  can be added incrementally. I can add support for this in 
> >>>the first  release but don’t necessarily see why this would be 
> >>>absolute necessity.
> >>>
> >>> Thanks
> >>> Parth
> >>>
> >>> On 4/24/15, 11:00 AM, "Gwen Shapira"  wrote:
> >>>
> Thanks.
> 
> One more thing I'm missing in the KIP is details on the G

[jira] [Created] (KAFKA-2152) Console producer fails to start when server running with broker.id != 0

2015-04-26 Thread Lior Gonnen (JIRA)
Lior Gonnen created KAFKA-2152:
--

 Summary: Console producer fails to start when server running with 
broker.id != 0
 Key: KAFKA-2152
 URL: https://issues.apache.org/jira/browse/KAFKA-2152
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.1
Reporter: Lior Gonnen
Priority: Critical


Scenario to reproduce:
1. Start zookeeper as usual: bin/zookeeper-server-start.sh 
config/zookeeper.properties
2. Start local server as usual: bin/kafka-server-start.sh 
config/server.properties
3. Start console producer: bin/kafka-console-producer.sh --broker-list 
localhost:9092 --topic test
4. Producer starts as usual, and allows sending messages
5. Stop the producer and server
6. In config/server.properties, change broker.id to 2 (or any other non-zero 
number)
7. Repeat steps 2 and 3
8. The received output is as follows:

[2015-04-26 20:00:47,571] WARN Error while fetching metadatapartition 0 
leader: nonereplicas:   isr:isUnderReplicated: false for topic 
partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
(kafka.producer.BrokerPartitionInfo)
[2015-04-26 20:00:47,575] WARN Failed to collate messages by topic,partition 
due to: No leader for any partition in topic test 
(kafka.producer.async.DefaultEventHandler)
[2015-04-26 20:00:47,682] WARN Error while fetching metadatapartition 0 
leader: nonereplicas:   isr:isUnderReplicated: false for topic 
partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
(kafka.producer.BrokerPartitionInfo)
[2015-04-26 20:00:47,682] WARN Failed to collate messages by topic,partition 
due to: No leader for any partition in topic test 
(kafka.producer.async.DefaultEventHandler)
[2015-04-26 20:00:47,789] WARN Error while fetching metadatapartition 0 
leader: nonereplicas:   isr:isUnderReplicated: false for topic 
partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
(kafka.producer.BrokerPartitionInfo)
[2015-04-26 20:00:47,790] WARN Failed to collate messages by topic,partition 
due to: No leader for any partition in topic test 
(kafka.producer.async.DefaultEventHandler)
[2015-04-26 20:00:47,897] WARN Error while fetching metadatapartition 0 
leader: nonereplicas:   isr:isUnderReplicated: false for topic 
partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
(kafka.producer.BrokerPartitionInfo)
[2015-04-26 20:00:47,897] WARN Failed to collate messages by topic,partition 
due to: No leader for any partition in topic test 
(kafka.producer.async.DefaultEventHandler)
[2015-04-26 20:00:48,002] WARN Error while fetching metadatapartition 0 
leader: nonereplicas:   isr:isUnderReplicated: false for topic 
partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
(kafka.producer.BrokerPartitionInfo)
[2015-04-26 20:00:48,004] ERROR Failed to send requests for topics test with 
correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
[2015-04-26 20:00:48,004] ERROR Error in handling batch of 1 events 
(kafka.producer.async.ProducerSendThread)
kafka.common.FailedToSendMessageException: Failed to send messages after 3 
tries.
at 
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
at 
kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
at scala.collection.immutable.Stream.foreach(Stream.scala:594)
at 
kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
at 
kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2147) Unbalanced replication can cause extreme purgatory growth

2015-04-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513475#comment-14513475
 ] 

Jiangjie Qin commented on KAFKA-2147:
-

Sorry for not responding to you on the mailing list sooner... I am actually not 
sure if your hypothesis stands. Because a ProduceRequest to broker A will not 
*trigger* a replica fetch request in in broker B. Broker B will be fetching 
continuously no matter if there is data in broker B or not. Theoretically, the 
fetch purgatory size should be upper bounded by the number of fetcher threads 
(could be replica fetchers or consumer fetchers) given we are still using the 
blocking channel in fetchers. Have you checked how many replica fetchers are 
you using?

> Unbalanced replication can cause extreme purgatory growth
> -
>
> Key: KAFKA-2147
> URL: https://issues.apache.org/jira/browse/KAFKA-2147
> Project: Kafka
>  Issue Type: Bug
>  Components: purgatory, replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>Assignee: Joel Koshy
>
> Apologies in advance, this is going to be a bit of complex description, 
> mainly because we've seen this issue several different ways and we're still 
> tying them together in terms of root cause and analysis.
> It is worth noting now that we have all our producers set up to send 
> RequiredAcks==-1, and that this includes all our MirrorMakers.
> I understand the purgatory is being rewritten (again) for 0.8.3. Hopefully 
> that will incidentally fix this issue, or at least render it moot.
> h4. Symptoms
> Fetch request purgatory on a broker or brokers grows rapidly and steadily at 
> a rate of roughly 1-5K requests per second. Heap memory used also grows to 
> keep pace. When 4-5 million requests have accumulated in purgatory, the 
> purgatory is drained, causing a substantial latency spike. The node will tend 
> to drop leadership, replicate, and recover.
> h5. Case 1 - MirrorMaker
> We first noticed this case when enabling mirrormaker. We had one primary 
> cluster already, with many producers and consumers. We created a second, 
> identical cluster and enabled replication from the original to the new 
> cluster on some topics using mirrormaker. This caused all six nodes in the 
> new cluster to exhibit the symptom in lockstep - their purgatories would all 
> grow together, and get drained within about 20 seconds of each other. The 
> cluster-wide latency spikes at this time caused several problems for us.
> Turning MM on and off turned the problem on and off very precisely. When we 
> stopped MM, the purgatories would all drop to normal levels immediately, and 
> would start climbing again when we restarted it.
> Note that this is the *fetch* purgatories on the brokers that MM was 
> *producing* to, which indicates fairly strongly that this is a replication 
> issue, not a MM issue.
> This particular cluster and MM setup was abandoned for other reasons before 
> we could make much progress debugging.
> h5. Case 2 - Broker 6
> The second time we saw this issue was on the newest broker (broker 6) in the 
> original cluster. For a long time we were running with five nodes, and 
> eventually added a sixth to handle the increased load. At first, we moved 
> only a handful of higher-volume partitions to this broker. Later, we created 
> a group of new topics (totalling around 100 partitions) for testing purposes 
> that were spread automatically across all six nodes. These topics saw 
> occasional traffic, but were generally unused. At this point broker 6 had 
> leadership for about an equal number of high-volume and unused partitions, 
> about 15-20 of each.
> Around this time (we don't have detailed enough data to prove real 
> correlation unfortunately), the issue started appearing on this broker as 
> well, but not on any of the other brokers in the cluster.
> h4. Debugging
> The first thing we tried was to reduce the 
> `fetch.purgatory.purge.interval.requests` from the default of 1000 to a much 
> lower value of 200. This had no noticeable effect at all.
> We then enabled debug logging on broker06 and started looking through that. I 
> can attach complete log samples if necessary, but the thing that stood out 
> for us was a substantial number of the following lines:
> {noformat}
> [2015-04-23 20:05:15,196] DEBUG [KafkaApi-6] Putting fetch request with 
> correlation id 49939 from client ReplicaFetcherThread-0-6 into purgatory 
> (kafka.server.KafkaApis)
> {noformat}
> The volume of these lines seemed to match (approximately) the fetch purgatory 
> growth on that broker.
> At this point we developed a hypothesis (detailed below) which guided our 
> subsequent debugging tests:
> - Setting a daemon up to produce regular random data to all of the topics led 
> by kafka06 (specific

[jira] [Commented] (KAFKA-2152) Console producer fails to start when server running with broker.id != 0

2015-04-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513482#comment-14513482
 ] 

Gwen Shapira commented on KAFKA-2152:
-

When you created the topic at first, the partitions were created on broker.id=0.
This configuration is saved in ZK and is "sticky". When you changed broker.id 
to 2, broker 2 is not a leader of any partition on the topic, nor does it have 
any replicas (since it did not exist when the topic was created). 

So, the topic has broker 0 assigned as the leader and the only replica for 
every partition, and broker 0 is not available. So you get " leader: none
replicas: isr: isUnderReplicated:" Not the clearest error, but basically says 
"Your topic has no leader, no replicas and none of the replicas are in sync 
with the leader".

Meanwhile, broker 2 has nothing assigned to it. If you need to support this 
scenario (changing broker id), you need to use the reassignment tool 
(https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-6.ReassignPartitionsTool)
 to move the partitions to the new broker.

The behavior you saw is expected, but a bit challenging to explain. Feel free 
to ask questions if I wasn't clear.

> Console producer fails to start when server running with broker.id != 0
> ---
>
> Key: KAFKA-2152
> URL: https://issues.apache.org/jira/browse/KAFKA-2152
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Lior Gonnen
>Priority: Critical
>
> Scenario to reproduce:
> 1. Start zookeeper as usual: bin/zookeeper-server-start.sh 
> config/zookeeper.properties
> 2. Start local server as usual: bin/kafka-server-start.sh 
> config/server.properties
> 3. Start console producer: bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> 4. Producer starts as usual, and allows sending messages
> 5. Stop the producer and server
> 6. In config/server.properties, change broker.id to 2 (or any other non-zero 
> number)
> 7. Repeat steps 2 and 3
> 8. The received output is as follows:
> [2015-04-26 20:00:47,571] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,575] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,682] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,682] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,789] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,790] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,897] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,897] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,002] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:48,004] ERROR Failed to send requests for topics test with 
> correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,004] ERROR Error in handling batch of 1 events 
> (kafka.producer.async.ProducerSendThread)
> kafka.common.FailedToSendMessageException: Failed to send messages after 3 
> tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at 
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
>   at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Produc

[jira] [Resolved] (KAFKA-2152) Console producer fails to start when server running with broker.id != 0

2015-04-26 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-2152.
-
Resolution: Not A Problem

> Console producer fails to start when server running with broker.id != 0
> ---
>
> Key: KAFKA-2152
> URL: https://issues.apache.org/jira/browse/KAFKA-2152
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Lior Gonnen
>Priority: Critical
>
> Scenario to reproduce:
> 1. Start zookeeper as usual: bin/zookeeper-server-start.sh 
> config/zookeeper.properties
> 2. Start local server as usual: bin/kafka-server-start.sh 
> config/server.properties
> 3. Start console producer: bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> 4. Producer starts as usual, and allows sending messages
> 5. Stop the producer and server
> 6. In config/server.properties, change broker.id to 2 (or any other non-zero 
> number)
> 7. Repeat steps 2 and 3
> 8. The received output is as follows:
> [2015-04-26 20:00:47,571] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,575] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,682] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,682] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,789] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,790] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,897] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,897] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,002] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:48,004] ERROR Failed to send requests for topics test with 
> correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,004] ERROR Error in handling batch of 1 events 
> (kafka.producer.async.ProducerSendThread)
> kafka.common.FailedToSendMessageException: Failed to send messages after 3 
> tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at 
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
>   at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
>   at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
>   at scala.collection.immutable.Stream.foreach(Stream.scala:594)
>   at 
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
>   at 
> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33557: Patch for KAFKA-1936

2015-04-26 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33557/#review81654
---


Should we also apply the same rule to bytes out rate?


core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala


Typo, larger.


- Jiangjie Qin


On April 25, 2015, 11:36 p.m., Dong Lin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33557/
> ---
> 
> (Updated April 25, 2015, 11:36 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1936
> https://issues.apache.org/jira/browse/KAFKA-1936
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1936; Track offset commit requests separately from produce requests
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/log/Log.scala 
> 5563f2de8113a0ece8929bec9c75dbf892abbb66 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> 8ddd325015de4245fd2cf500d8b0e8c1fd2bc7e8 
>   core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
> 652208a70f66045b854549d93cbbc2b77c24b10b 
> 
> Diff: https://reviews.apache.org/r/33557/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Dong Lin
> 
>



[jira] [Commented] (KAFKA-1940) Initial checkout and build failing

2015-04-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513494#comment-14513494
 ] 

Jiangjie Qin commented on KAFKA-1940:
-

The kafka.producer.AsyncProducerTest failures should be irrelevant. KAFKA-2103 
is created for those two tests failures.

> Initial checkout and build failing
> --
>
> Key: KAFKA-1940
> URL: https://issues.apache.org/jira/browse/KAFKA-1940
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.1.2
> Environment: Groovy:   1.8.6
> Ant:  Apache Ant(TM) version 1.9.2 compiled on July 8 2013
> Ivy:  2.2.0
> JVM:  1.8.0_25 (Oracle Corporation 25.25-b02)
> OS:   Windows 7 6.1 amd64
>Reporter: Martin Lemanski
>  Labels: build
> Attachments: zinc-upgrade.patch
>
>
> when performing `gradle wrapper` and `gradlew build` as a "new" developer, I 
> get an exception: 
> {code}
> C:\development\git\kafka>gradlew build --stacktrace
> <...>
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':core:compileScala'.
> > com.typesafe.zinc.Setup.create(Lcom/typesafe/zinc/ScalaLocation;Lcom/typesafe/zinc/SbtJars;Ljava/io/File;)Lcom/typesaf
> e/zinc/Setup;
> {code}
> Details: https://gist.github.com/mlem/ddff83cc8a25b040c157
> Current Commit:
> {code}
> C:\development\git\kafka>git rev-parse --verify HEAD
> 71602de0bbf7727f498a812033027f6cbfe34eb8
> {code}
> I am evaluating kafka for my company and wanted to run some tests with it, 
> but couldn't due to this error. I know gradle can be tricky and it's not easy 
> to setup everything correct, but this kind of bugs turns possible 
> commiters/users off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2103) kafka.producer.AsyncProducerTest failure.

2015-04-26 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513499#comment-14513499
 ] 

Jiangjie Qin commented on KAFKA-2103:
-

[~nehanarkhede] Yes, it is still failing. I'll make it a sub-task for the 
umbrella ticket. 

> kafka.producer.AsyncProducerTest failure.
> -
>
> Key: KAFKA-2103
> URL: https://issues.apache.org/jira/browse/KAFKA-2103
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>
> I saw this test consistently failing on trunk.
> The recent changes are KAFKA-2099, KAFKA-1926, KAFKA-1809.
> kafka.producer.AsyncProducerTest > testNoBroker FAILED
> org.scalatest.junit.JUnitTestFailedError: Should fail with 
> FailedToSendMessageException
> at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
> at 
> org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149)
> at org.scalatest.Assertions$class.fail(Assertions.scala:711)
> at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149)
> at 
> kafka.producer.AsyncProducerTest.testNoBroker(AsyncProducerTest.scala:300)
> kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED
> kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED
> kafka.producer.AsyncProducerTest > testFailedSendRetryLogic FAILED
> kafka.common.FailedToSendMessageException: Failed to send messages after 
> 3 tries.
> at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:91)
> at 
> kafka.producer.AsyncProducerTest.testFailedSendRetryLogic(AsyncProducerTest.scala:415)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33557: Patch for KAFKA-1936

2015-04-26 Thread Dong Lin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33557/
---

(Updated April 27, 2015, 4:20 a.m.)


Review request for kafka.


Bugs: KAFKA-1936
https://issues.apache.org/jira/browse/KAFKA-1936


Repository: kafka


Description
---

KAFKA-1936; Track offset commit requests separately from produce requests


Diffs (updated)
-

  core/src/main/scala/kafka/log/Log.scala 
5563f2de8113a0ece8929bec9c75dbf892abbb66 
  core/src/main/scala/kafka/server/KafkaApis.scala 
b4004aa3a1456d337199aa1245fb0ae61f6add46 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
8ddd325015de4245fd2cf500d8b0e8c1fd2bc7e8 
  core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
652208a70f66045b854549d93cbbc2b77c24b10b 

Diff: https://reviews.apache.org/r/33557/diff/


Testing
---


Thanks,

Dong Lin



[jira] [Updated] (KAFKA-1936) Track offset commit requests separately from produce requests

2015-04-26 Thread Dong Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-1936:

Attachment: KAFKA-1936_2015-04-26_21:20:28.patch

> Track offset commit requests separately from produce requests
> -
>
> Key: KAFKA-1936
> URL: https://issues.apache.org/jira/browse/KAFKA-1936
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya Auradkar
>Assignee: Dong Lin
> Attachments: KAFKA-1936.patch, KAFKA-1936_2015-04-26_21:20:28.patch
>
>
> In ReplicaManager, failed and total produce requests are updated from 
> appendToLocalLog. Since offset commit requests also follow the same path, 
> they are counted along with produce requests. Add a metric and count them 
> separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1936) Track offset commit requests separately from produce requests

2015-04-26 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513503#comment-14513503
 ] 

Dong Lin commented on KAFKA-1936:
-

Updated reviewboard https://reviews.apache.org/r/33557/diff/
 against branch origin/trunk

> Track offset commit requests separately from produce requests
> -
>
> Key: KAFKA-1936
> URL: https://issues.apache.org/jira/browse/KAFKA-1936
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya Auradkar
>Assignee: Dong Lin
> Attachments: KAFKA-1936.patch, KAFKA-1936_2015-04-26_21:20:28.patch
>
>
> In ReplicaManager, failed and total produce requests are updated from 
> appendToLocalLog. Since offset commit requests also follow the same path, 
> they are counted along with produce requests. Add a metric and count them 
> separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >