Re: [jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-08-18 Thread Andrew Coates
Hi [~junrao], has there been any more discussion or progress on this issue?

Thanks,

Andy
On Tue, 12 Jul 2016 at 10:11, Andy Coates (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372565#comment-15372565
> ]
>
> Andy Coates commented on KAFKA-3919:
> 
>
> [~junrao]  Good stuff. Look forward to hearing from you and getting
> involved more =)
>
> > Broker faills to start after ungraceful shutdown due to
> non-monotonically incrementing offsets in logs
> >
> --
> >
> > Key: KAFKA-3919
> > URL: https://issues.apache.org/jira/browse/KAFKA-3919
> > Project: Kafka
> >  Issue Type: Bug
> >  Components: core
> >Affects Versions: 0.9.0.1
> >Reporter: Andy Coates
> >
> > Hi All,
> > I encountered an issue with Kafka following a power outage that saw a
> proportion of our cluster disappear. When the power came back on several
> brokers halted on start up with the error:
> > {noformat}
> >   Fatal error during KafkaServerStartable startup. Prepare to
> shutdown”
> >   kafka.common.InvalidOffsetException: Attempt to append an offset
> (1239742691) to position 35728 no larger than the last offset appended
> (1239742822) to
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> >   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
> >   at kafka.log.LogSegment.recover(LogSegment.scala:188)
> >   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
> >   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
> >   at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> >   at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> >   at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> >   at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> >   at kafka.log.Log.loadSegments(Log.scala:160)
> >   at kafka.log.Log.(Log.scala:90)
> >   at
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
> >   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
> >   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >   at java.lang.Thread.run(Thread.java:745)
> > {noformat}
> > The only way to recover the brokers was to delete the log files that
> contained non monotonically incrementing offsets.
> > We've spent some time digging through the logs and I feel I may have
> worked out the sequence of events leading to this issue, (though this is
> based on some assumptions I've made about the way Kafka is working, which
> may be wrong).
> > First off, we have unclean leadership elections disable. (We did later
> enable them to help get around some other issues we were having, but this
> was several hours after this issue manifested), and we're producing to the
> topic with gzip compression and acks=1
> > We looked through the data logs that were causing the brokers to not
> start. What we found the initial part of the log has monotonically
> increasing offset, where each compressed batch normally contained one or
> two records. Then the is a batch that contains many records, whose first
> records have an offset below the previous batch and whose last record has
> an offset above the previous batch. Following on from this there continues
> a period of large batches, with monotonically increasing offsets, and then
> the log returns to batches with one or two records.
> > Our working assumption here is that the period before the offset dip,
> with the small batches, is pre-outage normal operation. The period of
> larger batches is from just after the outage, where producers have a back
> log to processes when the partition becomes available, and then things
> return to normal batch sizes again once the back log clears.
> > We did also look through the Kafka's application logs to try and piece
> together the series of events leading up to this. Here’s what we know
> happened, with regards to one partition that h

Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Ismael Juma
Congratulations Gwen! Great news.

Ismael

On 18 Aug 2016 2:44 am, "Jun Rao"  wrote:

> Hi, Everyone,
>
> Gwen Shapira has been active in the Kafka community since she became a
> Kafka committer
> about a year ago. I am glad to announce that Gwen is now a member of Kafka
> PMC.
>
> Congratulations, Gwen!
>
> Jun
>


[jira] [Commented] (KAFKA-4019) LogCleaner should grow read/write buffer to max message size for the topic

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426092#comment-15426092
 ] 

ASF GitHub Bot commented on KAFKA-4019:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1758

KAFKA-4019: Update log cleaner to handle max message size of topics

Grow read and write buffers of cleaner up to the maximum message size of 
the log being cleaned if the topic has larger max message size than the default 
config of the broker.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1758


commit fa89ba94cbc2abc54a94497c3207bb2bce62743a
Author: Rajini Sivaram 
Date:   2016-08-18T08:08:17Z

KAFKA-4019: Update log cleaner to handle max message size of topics




> LogCleaner should grow read/write buffer to max message size for the topic
> --
>
> Key: KAFKA-4019
> URL: https://issues.apache.org/jira/browse/KAFKA-4019
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Rajini Sivaram
>
> Currently, the LogCleaner.growBuffers() only grows the buffer up to the 
> default max message size. However, since the max message size can be 
> customized at the topic level, LogCleaner should allow the buffer to grow up 
> to the max message allowed by the topic. Otherwise, the cleaner will get 
> stuck on a large message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1758: KAFKA-4019: Update log cleaner to handle max messa...

2016-08-18 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1758

KAFKA-4019: Update log cleaner to handle max message size of topics

Grow read and write buffers of cleaner up to the maximum message size of 
the log being cleaned if the topic has larger max message size than the default 
config of the broker.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1758


commit fa89ba94cbc2abc54a94497c3207bb2bce62743a
Author: Rajini Sivaram 
Date:   2016-08-18T08:08:17Z

KAFKA-4019: Update log cleaner to handle max message size of topics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4045) Investigate feasibility of hooking into RocksDb's cache

2016-08-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska resolved KAFKA-4045.
-
Resolution: Fixed

> Investigate feasibility of hooking into RocksDb's cache
> ---
>
> Key: KAFKA-4045
> URL: https://issues.apache.org/jira/browse/KAFKA-4045
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Damian Guy
> Fix For: 0.10.1.0
>
>
> Ideally we could hook a listener into RockDb's cache so that when entries are 
> flushed or evicted from the cache the listener is called (and can 
> subsequently perform Kafka Streams-specific functions, like forward a record 
> downstream). That way we don't build our own cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-4051:
-

Assignee: Rajini Sivaram

> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426256#comment-15426256
 ] 

Ismael Juma commented on KAFKA-4051:


As discussed in the thread, Kafka uses System.currentTimeMillis() in a number 
of places, which means that changing the system clock backwards is bound to 
cause issues. I wouldn't be surprised if there are other problems in addition 
to the one mentioned here.

> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-3875) Transient test failure: kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime

2016-08-18 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-3875:


> Transient test failure: 
> kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime
> 
>
> Key: KAFKA-3875
> URL: https://issues.apache.org/jira/browse/KAFKA-3875
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Ismael Juma
>Assignee: Jun Rao
>  Labels: transient-unit-test-failure
> Fix For: 0.10.1.0
>
>
> It failed in a couple of builds.
> {code}
> java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
> expected:<100> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> kafka.api.BaseProducerSendTest.sendAndVerifyTimestamp(BaseProducerSendTest.scala:228)
>   at 
> kafka.api.BaseProducerSendTest.testSendNonCompressedMessageWithCreateTime(BaseProducerSendTest.scala:170)
> {code}
> And standard out:
> {code}
> org.scalatest.junit.JUnitTestFailedError: Send callback returns the following 
> exception
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:103)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:79)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1348)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:79)
>   at 
> kafka.api.BaseProducerSendTest$callback$4$.onCompletion(BaseProducerSendTest.scala:209)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:155)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Batch containing 
> 1 record(s) expired due to timeout while requesting metadata from brokers for 
> topic-0
> java.lang.IllegalStateException: Cannot send after the producer is closed.
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:172)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:466)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:430)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:353)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1.onCompletion(BaseProducerSendTest.scala:415)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:107)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:318)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:278)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:364)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://jenkins.confluent.io/job/kafka-trunk/905/
> https://jenkins.confluent.io/job/kafka-trunk/919 (the output is similar to 
> the first build)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3875) Transient test failure: kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426268#comment-15426268
 ] 

Ismael Juma commented on KAFKA-3875:


[~junrao], we had another failure for this test:

{code}
java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
expected:<100> but was:<0>
...
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
record(s) for topic-0 due to 9223372036854775801 ms has passed since batch 
creation plus linger time
{code}

That very large number seems suspicious.

https://jenkins.confluent.io/job/kafka-trunk/1074/testReport/junit/kafka.api/SslProducerSendTest/testSendNonCompressedMessageWithCreateTime/

> Transient test failure: 
> kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime
> 
>
> Key: KAFKA-3875
> URL: https://issues.apache.org/jira/browse/KAFKA-3875
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Ismael Juma
>Assignee: Jun Rao
>  Labels: transient-unit-test-failure
> Fix For: 0.10.1.0
>
>
> It failed in a couple of builds.
> {code}
> java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
> expected:<100> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> kafka.api.BaseProducerSendTest.sendAndVerifyTimestamp(BaseProducerSendTest.scala:228)
>   at 
> kafka.api.BaseProducerSendTest.testSendNonCompressedMessageWithCreateTime(BaseProducerSendTest.scala:170)
> {code}
> And standard out:
> {code}
> org.scalatest.junit.JUnitTestFailedError: Send callback returns the following 
> exception
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:103)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:79)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1348)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:79)
>   at 
> kafka.api.BaseProducerSendTest$callback$4$.onCompletion(BaseProducerSendTest.scala:209)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:155)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Batch containing 
> 1 record(s) expired due to timeout while requesting metadata from brokers for 
> topic-0
> java.lang.IllegalStateException: Cannot send after the producer is closed.
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:172)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:466)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:430)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:353)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1.onCompletion(BaseProducerSendTest.scala:415)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:107)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:318)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:278)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:364)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.ka

[jira] [Commented] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426279#comment-15426279
 ] 

Rajini Sivaram commented on KAFKA-4051:
---

I think the issue is around the handling of timer tasks in Kafka. While task 
expiry is set using {{System.currentTimeMillis}} which can move backwards (as 
reported in this JIRA), the internal timers in the TimingWheel in Kafka used to 
handle expiry is a monotonically increasing timer that starts with 
{{System.currentTimeMillis}}. This mismatch causes expiry of tasks until 
{{System.currentTimeMillis}} catches up with the internal timer.

As [~ijuma] has pointed out on the mailing list, Kafka uses 
{{System.currentTimeMillis}} in a lot of places and switching to 
{{System.nanoTime}} everywhere could impact performance. We have a few choices 
on fixing this JIRA (in increasing order of complexity)

# We could switch over to {{System.nanoTime}} for TimerTasks alone to fix the 
issue with delayed tasks reported here
# It may be possible to change the timer implementation to recover better when 
wall clock time moves backwards
# Replace {{System.currentTimeMillis}} with {{System.nanoTime}} in time 
comparisons throughout Kafka code

I am inclined to do 1) and run performance tests, but am interested in what 
others think.


> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was

[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

2016-08-18 Thread Harald Kirsch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426278#comment-15426278
 ] 

Harald Kirsch commented on KAFKA-2170:
--

The patch does not seem to work completely for compaction. I applied the patch 
on 40b1dd3f495a59abef8a0cba5450526994c92c04, ran ./gradlew releaseTarGz and 
unpacked on a windows 8.1. I ran the server with the following configs:
{noformat}
broker.id=0
listeners=PLAINTEXT://:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=C:\\Users\\hk\\tmp\\kafka-data
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=6000111
log.retention.check.interval.ms=30
zookeeper.connect=hal.intranet.raytion.com:2181/kafka
zookeeper.connection.timeout.ms=6000
log.cleaner.enable=true
log.cleanup.policy=compact
log.cleaner.min.cleanable.ratio=0.01
log.cleaner.backoff.ms=15000
log.segment.delete.delay.ms=600
auto.create.topics.enable=false
{noformat}

Then I fed the same set of docs twice. The cleaner logged a successful 
activity, but the folder now contains all {{.index.deleted}} files like this:
{noformat}
ModeLastWriteTime Length Name
- -- 
-a---18.08.2016 12:33  0 .index
-18.08.2016 12:33176 .index.deleted
-a---18.08.2016 12:33  0 .log
-a---18.08.2016 12:33  0 0023.index
-18.08.2016 12:33192 0023.index.deleted
-a---18.08.2016 12:33 583752 0023.log
-a---18.08.2016 12:33 48 0048.index
-18.08.2016 12:33192 0048.index.deleted
-a---18.08.2016 12:335844328 0048.log
-a---18.08.2016 12:33 48 0073.index
-18.08.2016 12:33200 0073.index.deleted
-a---18.08.2016 12:335494169 0073.log
-a---18.08.2016 12:33   10485760 0099.index
-a---18.08.2016 12:331529831 0099.log
{noformat}

Then I fed the same set of messages a third time. As soon as the cleaner 
started working thereafter, it bombed out like this:
{noformat}
[2016-08-18 13:10:43,254] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.KafkaStorageException: Failed to change the index file suffix from 
 to .deleted for log segment 0
at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:263)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:269)
at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:832)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:878)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:873)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Log.replaceSegments(Log.scala:873)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:395)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:343)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:342)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Cleaner.clean(LogCleaner.scala:342)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:237)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:215)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.nio.file.FileAlreadyExistsException: 
C:\Users\hk\tmp\kafka-data\hktest-0\.index -> 
C:\Users\hk\tmp\kafka-data\hktest-0\.index.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
at 
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
at java.nio.file.Files.move(Files.java:1395)
at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:670)
at kafka.log.OffsetIndex.renameTo(OffsetIndex.scala:364)
... 14 more
Suppressed: java.nio.file.AccessDeniedException: 
C:\Users\hk\tmp\kafka-data\hktest-0\.index -> 
C:\Users\hk\tmp\kafka-data\hktest-0\.index.deleted
at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
   

[jira] [Commented] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426290#comment-15426290
 ] 

Rajini Sivaram commented on KAFKA-4051:
---

[~ijuma] As you have pointed out, there are inevitably going to be issues since 
Kafka uses System.currentTimeMillis in so many places. But typically, you would 
expect a single clock change to cause one expiry and thereafter continue to 
work with the changed timer (eg. producer metadata get expires and retry works 
since it is on the updated clock). The issue in this JIRA is that the broker 
doesn't recover until the wall clock time reaches the previously set time. I 
imagine changing the clock back by an hour is an uncommon scenario, but the 
impact is quite big if it does happen. If we are fixing this issue, it will be 
useful to have a system test to check that Kafka continues to function after a 
major clock change.

> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426292#comment-15426292
 ] 

Ismael Juma commented on KAFKA-4051:


[~rsivaram], option 1 seems worth a try. Note that it would have to be tested 
on Linux and Mac to verify that it fixes the underlying issue (it would be good 
to test on Windows too, but we have known issues in that platform so should not 
be a blocker, I guess). The JDK implementation relies on the OS (and with some 
JDK-level workarounds sometimes), so it's platform dependent.

[~gwenshap] had some concerns on the usage of System.nanoTime with regards to 
the impact on loaded systems so it would be good to get her thoughts too.

> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1737: KAFKA-4038: Transient failure in DeleteTopicsReque...

2016-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1737


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4038) Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests

2016-08-18 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4038:
---
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1737
[https://github.com/apache/kafka/pull/1737]

> Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests
> -
>
> Key: KAFKA-4038
> URL: https://issues.apache.org/jira/browse/KAFKA-4038
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Grant Henke
> Fix For: 0.10.1.0
>
>
> {code}
> java.lang.AssertionError: The response error should match 
> Expected :REQUEST_TIMED_OUT
> Actual   :NONE
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:89)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:88)
>   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>   at 
> kafka.server.DeleteTopicsRequestTest.validateErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:88)
>   at 
> kafka.server.DeleteTopicsRequestTest.testErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:76)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4038) Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426295#comment-15426295
 ] 

ASF GitHub Bot commented on KAFKA-4038:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1737


> Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests
> -
>
> Key: KAFKA-4038
> URL: https://issues.apache.org/jira/browse/KAFKA-4038
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Grant Henke
> Fix For: 0.10.1.0
>
>
> {code}
> java.lang.AssertionError: The response error should match 
> Expected :REQUEST_TIMED_OUT
> Actual   :NONE
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:89)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:88)
>   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>   at 
> kafka.server.DeleteTopicsRequestTest.validateErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:88)
>   at 
> kafka.server.DeleteTopicsRequestTest.testErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:76)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3875) Transient test failure: kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426268#comment-15426268
 ] 

Ismael Juma edited comment on KAFKA-3875 at 8/18/16 11:52 AM:
--

[~junrao], we had another failure for this test:

{code}
java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
expected:<100> but was:<0>
...
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
record(s) for topic-0 due to 9223372036854775801 ms has passed since batch 
creation plus linger time
{code}

That very large number (Long.MaxValue - 6) seems suspicious, overflow perhaps? 
Also, I think our logging is a bit confusing as "(now - (this.createdMs + 
lingerMs))" can give negative numbers even without overflow (if the time 
between now and created is smaller than lingerMs).

https://jenkins.confluent.io/job/kafka-trunk/1074/testReport/junit/kafka.api/SslProducerSendTest/testSendNonCompressedMessageWithCreateTime/


was (Author: ijuma):
[~junrao], we had another failure for this test:

{code}
java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
expected:<100> but was:<0>
...
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
record(s) for topic-0 due to 9223372036854775801 ms has passed since batch 
creation plus linger time
{code}

That very large number seems suspicious.

https://jenkins.confluent.io/job/kafka-trunk/1074/testReport/junit/kafka.api/SslProducerSendTest/testSendNonCompressedMessageWithCreateTime/

> Transient test failure: 
> kafka.api.SslProducerSendTest.testSendNonCompressedMessageWithCreateTime
> 
>
> Key: KAFKA-3875
> URL: https://issues.apache.org/jira/browse/KAFKA-3875
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Ismael Juma
>Assignee: Jun Rao
>  Labels: transient-unit-test-failure
> Fix For: 0.10.1.0
>
>
> It failed in a couple of builds.
> {code}
> java.lang.AssertionError: Should have offset 100 but only successfully sent 0 
> expected:<100> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> kafka.api.BaseProducerSendTest.sendAndVerifyTimestamp(BaseProducerSendTest.scala:228)
>   at 
> kafka.api.BaseProducerSendTest.testSendNonCompressedMessageWithCreateTime(BaseProducerSendTest.scala:170)
> {code}
> And standard out:
> {code}
> org.scalatest.junit.JUnitTestFailedError: Send callback returns the following 
> exception
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:103)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:79)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1348)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:79)
>   at 
> kafka.api.BaseProducerSendTest$callback$4$.onCompletion(BaseProducerSendTest.scala:209)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
>   at 
> org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:155)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Batch containing 
> 1 record(s) expired due to timeout while requesting metadata from brokers for 
> topic-0
> java.lang.IllegalStateException: Cannot send after the producer is closed.
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:172)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:466)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:430)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:353)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> kafka.api.BaseProducerSendTest$CloseCallback$1$$anonfun$onCompletion$1.apply(BaseProducerSendTest.scala:415)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>  

[jira] [Created] (KAFKA-4061) Apache Kafka failover is not working

2016-08-18 Thread Sebastian Bruckner (JIRA)
Sebastian Bruckner created KAFKA-4061:
-

 Summary: Apache Kafka failover is not working
 Key: KAFKA-4061
 URL: https://issues.apache.org/jira/browse/KAFKA-4061
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.0
 Environment: Linux
Reporter: Sebastian Bruckner


We have a 3 node cluster (kafka1 to kafka3) on 0.10.0.0

When I shut down the node kafka1 i can see in the debug logs of my consumers 
the following:

{code}
Sending coordinator request for group f49dc74f-3ccb-4fef-bafc-a7547fe26bc8 to 
broker kafka3:9092 (id: 3 rack: null)

Received group coordinator response 
ClientResponse(receivedTimeMs=1471511333843, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@3892b449,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=118,client_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8},
 body={group_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}), 
createdTimeMs=1471511333794, sendTimeMs=1471511333794), 
responseBody={error_code=0,coordinator={node_id=1,host=kafka1,port=9092}})
{code}

So the problem is that kafka3 answers with an response telling the consumer 
that the coordinator is kafka1 (which is shut down).

This then happens over and over again.

When i restart the consumer i can see the following:

{code}
Updated cluster metadata version 1 to Cluster(nodes = [kafka2:9092 (id: -2 
rack: null), kafka1:9092 (id: -1 rack: null), kafka3:9092 (id: -3 rack: null)], 
partitions = [])

... responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
{code}

The difference is now that it answers with error code 15 
(GROUP_COORDINATOR_NOT_AVAILABLE). 

Somehow kafka doesn't elect a new group coordinator. 

In a local setup with 2 brokers and 1 zookeper it works fine..

Can you help me debugging this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #821

2016-08-18 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-4038; Transient failure in

--
[...truncated 12008 lines...]
org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValuesWithName STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValuesWithName PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testKTable STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testKTable PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testValueGetter 
STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream STARTED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream PASSED

org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapValuesTest > 
testFlatMapValues STARTED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilterNot 
STARTED

org.apache.kafka.streams.kstream.internals

Re: [VOTE] KIP:71 Enable log compaction and deletion to co-exist

2016-08-18 Thread Damian Guy
The vote is now concluded and KIP-71 has been accepted.

Thanks to everyone for your input.

Regards,
Damian

On Tue, 16 Aug 2016 at 01:25 Joel Koshy  wrote:

> +1
>
> On Mon, Aug 15, 2016 at 4:58 PM, Ewen Cheslack-Postava 
> wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > -Ewen
> >
> > On Mon, Aug 15, 2016 at 4:26 PM, Jun Rao  wrote:
> >
> > > Thanks for the proposal. +1
> > >
> > > Jun
> > >
> > > On Mon, Aug 15, 2016 at 6:20 AM, Damian Guy 
> > wrote:
> > >
> > > > I would like to initiate the voting process for KIP-71 (
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > > > ).
> > > >
> > > > This change will add a new cleanup.policy, compact_and_delete, that
> > when
> > > > enabled will run both compaction and deletion.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426521#comment-15426521
 ] 

Mickael Maison commented on KAFKA-4056:
---

Using trunk, I don't see known but unused configurations (like 
ssl.truststore.password) being logged. Only unknown configuration are listed.

I think the easiest is probably to only logged the unknown config key and not 
the value. Like this:
17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
ssl.truststore.password was supplied but isn't a known config.

> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread jaikiran pai (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426538#comment-15426538
 ] 

jaikiran pai commented on KAFKA-4056:
-

[~mimaison], we ran into this on our system when we created a consumer and 
passed consumer properties which included ssl.truststore.password property and 
a *non SSL* port for the broker via the bootstrap.servers config property (it 
pointed to localhost:9092 instead of a SSL port and we had both plaintext and 
SSL listeners enabled on the broker).


> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426541#comment-15426541
 ] 

Ismael Juma commented on KAFKA-4056:


[~mimaison], the key thing to trigger this issue is to set the config, but use 
PLAINTEXT as the security protocol (what is unused is determined dynamically).

The option you suggest is the same one Jaikiran suggested in the mailing list. 
It's probably fine. The other option is to use `values` instead of `originals`, 
but maybe that's more confusing.

> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1481

2016-08-18 Thread Apache Jenkins Server
See 



Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Grant Henke
Congratulations Gwen!



On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  wrote:

> Congratulations Gwen! Great news.
>
> Ismael
>
> On 18 Aug 2016 2:44 am, "Jun Rao"  wrote:
>
> > Hi, Everyone,
> >
> > Gwen Shapira has been active in the Kafka community since she became a
> > Kafka committer
> > about a year ago. I am glad to announce that Gwen is now a member of
> Kafka
> > PMC.
> >
> > Congratulations, Gwen!
> >
> > Jun
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426576#comment-15426576
 ] 

Mickael Maison commented on KAFKA-4056:
---

This is what I've been trying :) Connecting to a broker on 9092 using PLAINTEXT

I can reproduce fine with 0.9.0.1 but with trunk, I don't get this output. It 
only lists unknown configs. Known but unused configs appear in the 
Consumer/Producer normal config output and they are correctly hidden in case of 
passwords.

> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4039) Exit Strategy: using exceptions instead of inline invocation of exit/halt

2016-08-18 Thread Alexey Ozeritskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Ozeritskiy updated KAFKA-4039:
-
Attachment: deadlock-stack2

> Exit Strategy: using exceptions instead of inline invocation of exit/halt
> -
>
> Key: KAFKA-4039
> URL: https://issues.apache.org/jira/browse/KAFKA-4039
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
> Attachments: deadlock-stack2
>
>
> The current practice is to directly invoke halt/exit right after the line 
> that intends to terminate the execution. In the case of System.exit this 
> could cause deadlocks if the thread invoking System.exit is holding  a lock 
> that will be requested by the shutdown hook threads that will be started by 
> System.exit. An example is reported by [~aozeritsky] in KAFKA-3924. This 
> would also makes testing more difficult as it would require mocking static 
> methods of System and Runtime classes, which is not natively supported in 
> Java.
> One alternative suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15420269&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15420269]
>  would be to throw some dedicated exceptions that will eventually invoke 
> exit/halt:
> {quote} it would be great to move away from executing `System.exit` inline in 
> favour of throwing an exception (two examples, but maybe we can find better 
> names: FatalExitException and FatalHaltException) that is caught by some 
> central code that then does the `System.exit` or `Runtime.getRuntime.halt`. 
> This helps in a couple of ways:
> (1) Avoids issues with locks being held as in this issue
> (2) It makes it possible to abstract the action, which is very useful in 
> tests. At the moment, we can't easily test for these conditions as they cause 
> the whole test harness to exit. Worse, these conditions are sometimes 
> triggered in the tests and it's unclear why.
> (3) We can have more consistent logging around these actions and possibly 
> extended logging for tests
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426594#comment-15426594
 ] 

Ismael Juma commented on KAFKA-4056:


Interesting, maybe it's been fixed then.

> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1759: KAFKA-4056: Kafka logs values of sensitive configs...

2016-08-18 Thread mimaison
GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/1759

KAFKA-4056: Kafka logs values of sensitive configs like passwords

In case of unknown configs, only list the name without the value

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka KAFKA-4056

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1759


commit bd3c0a9ceb2661cd027319bae17fbd3c6df7d54d
Author: Mickael Maison 
Date:   2016-08-18T15:01:26Z

KAFKA-4056: Kafka logs values of sensitive configs like passwords

In case of unknown configs, only list the name without the value




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426600#comment-15426600
 ] 

ASF GitHub Bot commented on KAFKA-4056:
---

GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/1759

KAFKA-4056: Kafka logs values of sensitive configs like passwords

In case of unknown configs, only list the name without the value

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka KAFKA-4056

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1759


commit bd3c0a9ceb2661cd027319bae17fbd3c6df7d54d
Author: Mickael Maison 
Date:   2016-08-18T15:01:26Z

KAFKA-4056: Kafka logs values of sensitive configs like passwords

In case of unknown configs, only list the name without the value




> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4056) Kafka logs values of sensitive configs like passwords

2016-08-18 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426603#comment-15426603
 ] 

Mickael Maison commented on KAFKA-4056:
---

I've sent a PR to only log the name is case of unknown configs

> Kafka logs values of sensitive configs like passwords
> -
>
> Key: KAFKA-4056
> URL: https://issues.apache.org/jira/browse/KAFKA-4056
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: jaikiran pai
>Assignee: Mickael Maison
>
> From the mail discussion here: 
> https://www.mail-archive.com/dev@kafka.apache.org/msg55012.html
> {quote}
> We are using 0.9.0.1 of Kafka (Java) libraries for our Kafka consumers and 
> producers. In one of our consumers, our consumer config had a SSL specific 
> property which ended up being used against a non-SSL Kafka broker port. As a 
> result, the logs ended up seeing messages like:
> 17:53:33,722 WARN [o.a.k.c.c.ConsumerConfig] - The configuration 
> *ssl.truststore.password = foobar* was supplied but isn't a known config.
> The log message is fine and makes sense, but can Kafka please not log the 
> values of the properties and instead just include the config name which it 
> considers as unknown? That way it won't ended up logging these potentially 
> sensitive values. I understand that only those with access to these log files 
> can end up seeing these values but even then some of our internal processes 
> forbid logging such sensitive information to the logs. This log message will 
> still end up being useful if only the config name is logged without the 
> value. 
> {quote}
> Apparently (as noted in that thread), there's already code in the Kafka 
> library which masks sensitive values like passwords, but it looks like 
> there's a bug where it unintentionally logs these raw values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-74: Add Fetch Response Size Limit in Bytes

2016-08-18 Thread Andrey L. Neporada
Hi, Jason!

> On 17 Aug 2016, at 21:53, Jason Gustafson  wrote:
> 
> Hi Andrey,
> 
> Thanks for picking this up and apologies for the late comment.
> 
> One thing worth mentioning is that the consumer actually sends multiple
> parallel fetch requests, one for each broker that is hosting some of the
> assigned partitions. Unless you were planning to modify this behavior, this
> KIP actually changes the maximum memory used by the consumer from
> 
> max.partition.fetch.bytes * num_partitions
> 
> to
> 
> fetch.response.max.bytes * num_brokers
> 
> I guess it's really the minimum of the two values since
> max.partition.fetch.bytes is still supported. I think this is still a very
> helpful feature, but it's probably worth calling this out in the KIP.

Good point. I’ll add comment about it.

> 
> Also, one question on naming: would it make sense to change
> "fetch.response.max.bytes" to "max.fetch.bytes"? Seems to fit nicer with
> "max.partition.fetch.bytes”.
> 

I have no objections. However, we already initiated voting procedure on this 
KIP, so I am a bit unsure wether can I change KIP now.
Jun, what do you think?


> 
> Thanks,
> Jason
> 
> 

Thanks,
Andrey.



Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Ashish Singh
Congrats Gwen!

On Thursday, August 18, 2016, Grant Henke  wrote:

> Congratulations Gwen!
>
>
>
> On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  > wrote:
>
> > Congratulations Gwen! Great news.
> >
> > Ismael
> >
> > On 18 Aug 2016 2:44 am, "Jun Rao" >
> wrote:
> >
> > > Hi, Everyone,
> > >
> > > Gwen Shapira has been active in the Kafka community since she became a
> > > Kafka committer
> > > about a year ago. I am glad to announce that Gwen is now a member of
> > Kafka
> > > PMC.
> > >
> > > Congratulations, Gwen!
> > >
> > > Jun
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com  | twitter.com/gchenke |
> linkedin.com/in/granthenke
>


-- 
Ashish 🎤h


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Ben Stopford
Hi Joel

Ha! yes we had some similar thoughts, on both counts. Both are actually good 
approaches, but come with some extra complexity. 

Segregating the replication type is tempting as it creates a more general 
solution. One issue is you need to draw a line between lagging and not lagging. 
The ISR ‘limit' is a tempting divider, but has the side effect that, once you 
drop out you get immediately throttled. Adding a configurable divider is 
another option, but difficult for admins to set, and always a little arbitrary. 
A better idea is to prioritise, in reverse order to lag. But that also comes 
with additional complexity of its own. 

Under throttling is also a tempting addition. That’s to say, if there’s idle 
bandwidth lying around, not being used, why not use it to let lagging brokers 
catch up. This involves some comparison to the maximum bandwidth, which could 
be configurable, or could be derived, with pros and cons for each. 

But the more general problem is actually quite hard to reason about, so after 
some discussion we decided to settle on something simple, that we felt we could 
get working, and extend to add these additional features as subsequent KIPs. 

I hope that seems reasonable. Jun may wish to add to this. 

B


> On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
> 
> On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford  wrote:
> 
>> 
>> Let's us know if you have any further thoughts on KIP-73, else we'll kick
>> off a vote.
>> 
> 
> I think the mechanism for throttling replicas looks good. Just had a few
> more thoughts on the configuration section. What you have looks reasonable,
> but I was wondering if it could be made simpler. You probably thought
> through these, so I'm curious to know your take.
> 
> My guess is that most of the time, users would want to throttle all effect
> replication - due to partition reassignments, adding brokers or a broker
> coming back online after an extended period of time. In all these scenarios
> it may be possible to distinguish bootstrap (effect) vs normal replication
> - based on how far the replica has to catch up. I'm wondering if it is
> enough to just set an umbrella "effect" replication quota with perhaps
> per-topic overrides (say if some topics are more important than others) as
> opposed to designating throttled replicas.
> 
> Also, IIRC during client-side quota discussions we had considered the
> possibility of allowing clients to go above their quotas when resources are
> available. We ended up not doing that, but for replication throttling it
> may make sense - i.e., to treat the quota as a soft limit. Another way to
> look at it is instead of ensuring "effect replication traffic does not flow
> faster than X bytes/sec" it may be useful to instead ensure that "effect
> replication traffic only flows as slowly as necessary (so as not to
> adversely affect normal replication traffic)."
> 
> Thanks,
> 
> Joel
> 
>>> 
 On Thu, Aug 11, 2016 at 2:43 PM, Jun Rao >> > wrote:
 
> Hi, Joel,
> 
> Yes, the response size includes both throttled and unthrottled
>>> replicas.
> However, the response is only delayed up to max.wait if the response
>>> size
> is less than min.bytes, which matches the current behavior. So, there
>>> is
 no
> extra delay to due throttling, right? For replica fetchers, the
>> default
> min.byte is 1. So, the response is only delayed if there is no byte
>> in
 the
> response, which is what we want.
> 
> Thanks,
> 
> Jun
> 
> On Thu, Aug 11, 2016 at 11:53 AM, Joel Koshy >> >
 wrote:
> 
>> Hi Jun,
>> 
>> I'm not sure that would work unless we have separate replica
>>> fetchers,
>> since this would cause all replicas (including ones that are not
> throttled)
>> to get delayed. Instead, we could just have the leader populate the
>> throttle-time field of the response as a hint to the follower as to
>>> how
>> long it should wait before it adds those replicas back to its
 subsequent
>> replica fetch requests.
>> 
>> Thanks,
>> 
>> Joel
>> 
>> On Thu, Aug 11, 2016 at 9:50 AM, Jun Rao >> > wrote:
>> 
>>> Mayuresh,
>>> 
>>> That's a good question. I think if the response size (after
>> leader
>>> throttling) is smaller than min.bytes, we will just delay the
>>> sending
> of
>>> the response up to max.wait as we do now. This should prevent
 frequent
>>> empty responses to the follower.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Wed, Aug 10, 2016 at 9:17 PM, Mayuresh Gharat <
>>> gharatmayures...@gmail.com 
 wrote:
>>> 
 This might have been answered before.
 I was wondering when the leader quota is reached and it sends
>>> empty
 response ( If the inclusion of a partition, listed in the
>>> leader's
 throttled-replicas list, causes the LeaderQuotaRate to be
>>> exceeded,
>> that
 par

Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-08-18 Thread Grant Henke
Thanks for the feedback. Below are some responses:


> I don't have any problem with breaking things into 2 requests if it's
> necessary or optimal. But can you explain why separate requests "vastly
> simplifies the broker side implementation"? It doesn't seem like it should
> be particularly complex to process each ACL change in order.


You are right, it isn't super difficult to process ACL changes in order.
The simplicity I was thinking about comes from not needing to group and
re-order them like in the current implementation. It also removes the
"action" enumerator which also simplifies things a bit. I am open to both
ideas (single alter processing in order vs separate requests) as the trade
offs aren't drastic.

I am leaning towards separate requests for a few reasons:

   - The Admin API will likely submit separate requests for delete and add
   regardless
  - I expect most admin apis will have a removeAcl/s and addAcl/s api
  the fires a request immediately. I don't expect "batching" explicit or
  implicit to be all that common.
   - Independent concerns and capabilities
  - Separating delete into its own request makes it easier to define
  "delete all" type behavior.
  - I think 2 simple requests may be easier to document and understand
  by client developers than 1 more complicated one.
   - Matches the Authorizer interface
  - Separate requests matches authorizer interface more closely and
  still allows for actions on collections of ACLs instead of one
call per ACL
  via Authorizer.addAcls(acls: Set[Acl], resource: Resource) and
  Authorizer.removeAcls(acls: Set[Acl], resource: Resource)

Hmm, even if all ACL requests are directed to the controller, concurrency
> is only guaranteed on the same connection. For requests coming from
> different connections, the order that they get processed on the broker is
> non-deterministic.


I was thinking less about making sure the exact order of requests is
handled correctly, since each client will likely get a response between
each request before sending another. Its more about ensuring local
state/cache is accurate and that there is no way 2 nodes can have different
ACLs which they think are correct. Good implementations will handle this,
but may take a performance hit.

Perhaps I am overthinking it and the SimpleAclAuthorizer is the only one
that would be affected by this (and likely rarely because volume of ACL
write requests should not be high). The SimpleAclAuthorizer is eventually
consistent between instances. It writes optimistically with the cached zk
node version while writing a complete list of ACLs (not just adding
removing single nodes for acls). Below is a concrete demonstration of the
impact I am thinking about with the SimpleAclAuthorizer:

If we force all writes to go through one instance then follow up (ignoring
first call to warm cache) writes for a resource would:

   1. Call addAcls
   2. Call updateResourceAcls combining the current cached acls and the new
   acls
   3. Write the result to Zookeeper via conditional write on the cached
   version
   4. Success 1 remote call

If requests can go through any instance then follow up writes for a
resource may:

   1. Call addAcls
   2. Call updateResourceAcls combining the current cached acls and the new
   acls
  1. If no cached acls read from Zookeeper
   3. Write the result to Zookeeper via conditional write on the cached
   version
  1. If the cached version is wrong due to a write on another
  instance read from Zookeeper
  2. Rebuild the final ACLs list
  3. Repeat until the write is successful
   4. Success in 1 or 4 (or more) remote calls

It looks like the Sentry implementation would not have this issue and the
Ranger implementation doesn't support modifying ACLs anyway (must use the
Ranger UI/API).

I wanted to explain my original thoughts, but I am happy to remove the
controller constraint given the SimpleAclAuthorizer appears to be the only
(of those I know) implementation to be affected.

Thank you,
Grant






On Sat, Aug 13, 2016 at 5:41 PM, Ewen Cheslack-Postava 
wrote:

> On Mon, Aug 8, 2016 at 2:44 PM, Grant Henke  wrote:
>
> > Thank you for the feedback everyone. Below I respond to the last batch of
> > emails:
> >
> > You mention that "delete" actions
> > > will get processed before "add" actions, which makes sense to me. An
> > > alternative to avoid the confusion in the first place would be to
> replace
> > > the AlterAcls APIs with separate AddAcls and DeleteAcls APIs. Was this
> > > option already rejected?
> >
> >
> > 4. There is no CreateAcls or DeleteAcls (unlike CreateTopics and
> >
> > DeleteTopics, for example). It would be good to explain the reasoning for
> >
> > this choice (Jason also asked this question).
> >
> >
> > Re: 4 and Create/Delete vs Alter, I'm a fan of being able to bundle a
> bunch
> > > of changes in one request. Seems like an ACL change could easily
> include
> > > additions + dele

[jira] [Assigned] (KAFKA-4007) Improve fetch pipelining for low values of max.poll.records

2016-08-18 Thread Mickael Maison (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-4007:
-

Assignee: Mickael Maison

> Improve fetch pipelining for low values of max.poll.records
> ---
>
> Key: KAFKA-4007
> URL: https://issues.apache.org/jira/browse/KAFKA-4007
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Mickael Maison
>
> Currently the consumer will only send a prefetch for a partition after all 
> the records from the previous fetch have been consumed. This can lead to 
> suboptimal pipelining when max.poll.records is set very low since the 
> processing latency for a small set of records may be small compared to the 
> latency of a fetch. An improvement suggested by [~junrao] is to send the 
> fetch anyway even if we have unprocessed data buffered, but delay reading it 
> from the socket until that data has been consumed. Potentially the consumer 
> can delay reading _any_ pending fetch until it is ready to be returned to the 
> user, which may help control memory better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Vahid S Hashemian
Congratulations Gwen!

--Vahid




From:   Ashish Singh 
To: "dev@kafka.apache.org" 
Date:   08/18/2016 09:27 AM
Subject:Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira



Congrats Gwen!

On Thursday, August 18, 2016, Grant Henke  wrote:

> Congratulations Gwen!
>
>
>
> On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  > wrote:
>
> > Congratulations Gwen! Great news.
> >
> > Ismael
> >
> > On 18 Aug 2016 2:44 am, "Jun Rao" >
> wrote:
> >
> > > Hi, Everyone,
> > >
> > > Gwen Shapira has been active in the Kafka community since she became 
a
> > > Kafka committer
> > > about a year ago. I am glad to announce that Gwen is now a member of
> > Kafka
> > > PMC.
> > >
> > > Congratulations, Gwen!
> > >
> > > Jun
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com  | twitter.com/gchenke |
> linkedin.com/in/granthenke
>


-- 
Ashish 🎤h






Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Guozhang Wang
Congrats Gwen!

On Thu, Aug 18, 2016 at 9:27 AM, Ashish Singh  wrote:

> Congrats Gwen!
>
> On Thursday, August 18, 2016, Grant Henke  wrote:
>
> > Congratulations Gwen!
> >
> >
> >
> > On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  > > wrote:
> >
> > > Congratulations Gwen! Great news.
> > >
> > > Ismael
> > >
> > > On 18 Aug 2016 2:44 am, "Jun Rao" >
> > wrote:
> > >
> > > > Hi, Everyone,
> > > >
> > > > Gwen Shapira has been active in the Kafka community since she became
> a
> > > > Kafka committer
> > > > about a year ago. I am glad to announce that Gwen is now a member of
> > > Kafka
> > > > PMC.
> > > >
> > > > Congratulations, Gwen!
> > > >
> > > > Jun
> > > >
> > >
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com  | twitter.com/gchenke |
> > linkedin.com/in/granthenke
> >
>
>
> --
> Ashish 🎤h
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Jun Rao
Joel,

For your first comment. We thought about determining "effect" replicas
automatically as well. First, there are some tricky stuff that one has to
figure out as Ben pointed out. For example, what's the definition of
"effect" replicas. If a replica falls out of ISR temporarily, does it
become an "effect" replica immediately (If so, those replicas may take
longer to be back in sync again due to throttling). Also, how to distribute
the quota bytes across effect replicas. Should that be distributed evenly,
or proportionally based on lag? Figuring these out is possible, but
requires more thoughts and experiments. I am not sure if this needs to be
in the first version of replication throttling. Second, even if you
automate this, it seems that it would still be useful to have a manual mode
for the admins to control exactly what they want.

For your second comment, we discussed that in the client quotas design. A
down side of that for client quotas is that a client may be surprised that
its traffic is not throttled at one time, but throttled as another with the
same quota (basically, less predicability). You can imaging setting a quota
for all replication traffic and only slow down the "effect" replicas if
needed. The thought is more or less the same as the above. It requires more
thinking since it's more complicated. In any case, it seems it's always
useful to have a manual mode for admins who want to have full control of
what and how much to throttle.

Thanks,

Jun


On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford  wrote:

> Hi Joel
>
> Ha! yes we had some similar thoughts, on both counts. Both are actually
> good approaches, but come with some extra complexity.
>
> Segregating the replication type is tempting as it creates a more general
> solution. One issue is you need to draw a line between lagging and not
> lagging. The ISR ‘limit' is a tempting divider, but has the side effect
> that, once you drop out you get immediately throttled. Adding a
> configurable divider is another option, but difficult for admins to set,
> and always a little arbitrary. A better idea is to prioritise, in reverse
> order to lag. But that also comes with additional complexity of its own.
>
> Under throttling is also a tempting addition. That’s to say, if there’s
> idle bandwidth lying around, not being used, why not use it to let lagging
> brokers catch up. This involves some comparison to the maximum bandwidth,
> which could be configurable, or could be derived, with pros and cons for
> each.
>
> But the more general problem is actually quite hard to reason about, so
> after some discussion we decided to settle on something simple, that we
> felt we could get working, and extend to add these additional features as
> subsequent KIPs.
>
> I hope that seems reasonable. Jun may wish to add to this.
>
> B
>
>
> > On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
> >
> > On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford  wrote:
> >
> >>
> >> Let's us know if you have any further thoughts on KIP-73, else we'll
> kick
> >> off a vote.
> >>
> >
> > I think the mechanism for throttling replicas looks good. Just had a few
> > more thoughts on the configuration section. What you have looks
> reasonable,
> > but I was wondering if it could be made simpler. You probably thought
> > through these, so I'm curious to know your take.
> >
> > My guess is that most of the time, users would want to throttle all
> effect
> > replication - due to partition reassignments, adding brokers or a broker
> > coming back online after an extended period of time. In all these
> scenarios
> > it may be possible to distinguish bootstrap (effect) vs normal
> replication
> > - based on how far the replica has to catch up. I'm wondering if it is
> > enough to just set an umbrella "effect" replication quota with perhaps
> > per-topic overrides (say if some topics are more important than others)
> as
> > opposed to designating throttled replicas.
> >
> > Also, IIRC during client-side quota discussions we had considered the
> > possibility of allowing clients to go above their quotas when resources
> are
> > available. We ended up not doing that, but for replication throttling it
> > may make sense - i.e., to treat the quota as a soft limit. Another way to
> > look at it is instead of ensuring "effect replication traffic does not
> flow
> > faster than X bytes/sec" it may be useful to instead ensure that "effect
> > replication traffic only flows as slowly as necessary (so as not to
> > adversely affect normal replication traffic)."
> >
> > Thanks,
> >
> > Joel
> >
> >>>
>  On Thu, Aug 11, 2016 at 2:43 PM, Jun Rao  >>> > wrote:
> 
> > Hi, Joel,
> >
> > Yes, the response size includes both throttled and unthrottled
> >>> replicas.
> > However, the response is only delayed up to max.wait if the response
> >>> size
> > is less than min.bytes, which matches the current behavior. So, there
> >>> is
>  no
> > extra delay to due throttling, right?

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Joel Koshy
> For your first comment. We thought about determining "effect" replicas
> automatically as well. First, there are some tricky stuff that one has to
>

Auto-detection of effect traffic: i'm fairly certain it's doable but
definitely tricky. I'm also not sure it is something worth tackling at the
outset. If we want to spend more time thinking over it even if it's just an
academic exercise I would be happy to brainstorm offline.


> For your second comment, we discussed that in the client quotas design. A
> down side of that for client quotas is that a client may be surprised that
> its traffic is not throttled at one time, but throttled as another with the
> same quota (basically, less predicability). You can imaging setting a quota
> for all replication traffic and only slow down the "effect" replicas if
> needed. The thought is more or less the same as the above. It requires more
>

For clients, this is true. I think this is much less of an issue for
server-side replication since the "users" here are the Kafka SREs who
generally know these internal details.

I think it would be valuable to get some feedback from SREs on the proposal
before proceeding to a vote. (ping Todd)

Joel


>
> On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford  wrote:
>
> > Hi Joel
> >
> > Ha! yes we had some similar thoughts, on both counts. Both are actually
> > good approaches, but come with some extra complexity.
> >
> > Segregating the replication type is tempting as it creates a more general
> > solution. One issue is you need to draw a line between lagging and not
> > lagging. The ISR ‘limit' is a tempting divider, but has the side effect
> > that, once you drop out you get immediately throttled. Adding a
> > configurable divider is another option, but difficult for admins to set,
> > and always a little arbitrary. A better idea is to prioritise, in reverse
> > order to lag. But that also comes with additional complexity of its own.
> >
> > Under throttling is also a tempting addition. That’s to say, if there’s
> > idle bandwidth lying around, not being used, why not use it to let
> lagging
> > brokers catch up. This involves some comparison to the maximum bandwidth,
> > which could be configurable, or could be derived, with pros and cons for
> > each.
> >
> > But the more general problem is actually quite hard to reason about, so
> > after some discussion we decided to settle on something simple, that we
> > felt we could get working, and extend to add these additional features as
> > subsequent KIPs.
> >
> > I hope that seems reasonable. Jun may wish to add to this.
> >
> > B
> >
> >
> > > On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
> > >
> > > On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford 
> wrote:
> > >
> > >>
> > >> Let's us know if you have any further thoughts on KIP-73, else we'll
> > kick
> > >> off a vote.
> > >>
> > >
> > > I think the mechanism for throttling replicas looks good. Just had a
> few
> > > more thoughts on the configuration section. What you have looks
> > reasonable,
> > > but I was wondering if it could be made simpler. You probably thought
> > > through these, so I'm curious to know your take.
> > >
> > > My guess is that most of the time, users would want to throttle all
> > effect
> > > replication - due to partition reassignments, adding brokers or a
> broker
> > > coming back online after an extended period of time. In all these
> > scenarios
> > > it may be possible to distinguish bootstrap (effect) vs normal
> > replication
> > > - based on how far the replica has to catch up. I'm wondering if it is
> > > enough to just set an umbrella "effect" replication quota with perhaps
> > > per-topic overrides (say if some topics are more important than others)
> > as
> > > opposed to designating throttled replicas.
> > >
> > > Also, IIRC during client-side quota discussions we had considered the
> > > possibility of allowing clients to go above their quotas when resources
> > are
> > > available. We ended up not doing that, but for replication throttling
> it
> > > may make sense - i.e., to treat the quota as a soft limit. Another way
> to
> > > look at it is instead of ensuring "effect replication traffic does not
> > flow
> > > faster than X bytes/sec" it may be useful to instead ensure that
> "effect
> > > replication traffic only flows as slowly as necessary (so as not to
> > > adversely affect normal replication traffic)."
> > >
> > > Thanks,
> > >
> > > Joel
> > >
> > >>>
> >  On Thu, Aug 11, 2016 at 2:43 PM, Jun Rao  > >>> > wrote:
> > 
> > > Hi, Joel,
> > >
> > > Yes, the response size includes both throttled and unthrottled
> > >>> replicas.
> > > However, the response is only delayed up to max.wait if the
> response
> > >>> size
> > > is less than min.bytes, which matches the current behavior. So,
> there
> > >>> is
> >  no
> > > extra delay to due throttling, right? For replica fetchers, the
> > >> default
> > > min.byte is 1. So, the response is only de

Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Gwen Shapira
Thanks team Kafka :) Very excited and happy to contribute and be part
of this fantastic community.



On Thu, Aug 18, 2016 at 9:52 AM, Guozhang Wang  wrote:
> Congrats Gwen!
>
> On Thu, Aug 18, 2016 at 9:27 AM, Ashish Singh  wrote:
>
>> Congrats Gwen!
>>
>> On Thursday, August 18, 2016, Grant Henke  wrote:
>>
>> > Congratulations Gwen!
>> >
>> >
>> >
>> > On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma > > > wrote:
>> >
>> > > Congratulations Gwen! Great news.
>> > >
>> > > Ismael
>> > >
>> > > On 18 Aug 2016 2:44 am, "Jun Rao" >
>> > wrote:
>> > >
>> > > > Hi, Everyone,
>> > > >
>> > > > Gwen Shapira has been active in the Kafka community since she became
>> a
>> > > > Kafka committer
>> > > > about a year ago. I am glad to announce that Gwen is now a member of
>> > > Kafka
>> > > > PMC.
>> > > >
>> > > > Congratulations, Gwen!
>> > > >
>> > > > Jun
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Grant Henke
>> > Software Engineer | Cloudera
>> > gr...@cloudera.com  | twitter.com/gchenke |
>> > linkedin.com/in/granthenke
>> >
>>
>>
>> --
>> Ashish 🎤h
>>
>
>
>
> --
> -- Guozhang



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Jun Rao
Joel,

Yes, for your second comment. The tricky thing is still to figure out which
replicas to throttle and by how much since in general, admins probably
don't want already in-sync or close to in-sync replicas to be throttled. It
would be great to get Todd's opinion on this. Could you ping him?

Yes, we'd be happy to discuss auto-detection of effect traffic more offline.

Thanks,

Jun

On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy  wrote:

> > For your first comment. We thought about determining "effect" replicas
> > automatically as well. First, there are some tricky stuff that one has to
> >
>
> Auto-detection of effect traffic: i'm fairly certain it's doable but
> definitely tricky. I'm also not sure it is something worth tackling at the
> outset. If we want to spend more time thinking over it even if it's just an
> academic exercise I would be happy to brainstorm offline.
>
>
> > For your second comment, we discussed that in the client quotas design. A
> > down side of that for client quotas is that a client may be surprised
> that
> > its traffic is not throttled at one time, but throttled as another with
> the
> > same quota (basically, less predicability). You can imaging setting a
> quota
> > for all replication traffic and only slow down the "effect" replicas if
> > needed. The thought is more or less the same as the above. It requires
> more
> >
>
> For clients, this is true. I think this is much less of an issue for
> server-side replication since the "users" here are the Kafka SREs who
> generally know these internal details.
>
> I think it would be valuable to get some feedback from SREs on the proposal
> before proceeding to a vote. (ping Todd)
>
> Joel
>
>
> >
> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford  wrote:
> >
> > > Hi Joel
> > >
> > > Ha! yes we had some similar thoughts, on both counts. Both are actually
> > > good approaches, but come with some extra complexity.
> > >
> > > Segregating the replication type is tempting as it creates a more
> general
> > > solution. One issue is you need to draw a line between lagging and not
> > > lagging. The ISR ‘limit' is a tempting divider, but has the side effect
> > > that, once you drop out you get immediately throttled. Adding a
> > > configurable divider is another option, but difficult for admins to
> set,
> > > and always a little arbitrary. A better idea is to prioritise, in
> reverse
> > > order to lag. But that also comes with additional complexity of its
> own.
> > >
> > > Under throttling is also a tempting addition. That’s to say, if there’s
> > > idle bandwidth lying around, not being used, why not use it to let
> > lagging
> > > brokers catch up. This involves some comparison to the maximum
> bandwidth,
> > > which could be configurable, or could be derived, with pros and cons
> for
> > > each.
> > >
> > > But the more general problem is actually quite hard to reason about, so
> > > after some discussion we decided to settle on something simple, that we
> > > felt we could get working, and extend to add these additional features
> as
> > > subsequent KIPs.
> > >
> > > I hope that seems reasonable. Jun may wish to add to this.
> > >
> > > B
> > >
> > >
> > > > On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
> > > >
> > > > On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford 
> > wrote:
> > > >
> > > >>
> > > >> Let's us know if you have any further thoughts on KIP-73, else we'll
> > > kick
> > > >> off a vote.
> > > >>
> > > >
> > > > I think the mechanism for throttling replicas looks good. Just had a
> > few
> > > > more thoughts on the configuration section. What you have looks
> > > reasonable,
> > > > but I was wondering if it could be made simpler. You probably thought
> > > > through these, so I'm curious to know your take.
> > > >
> > > > My guess is that most of the time, users would want to throttle all
> > > effect
> > > > replication - due to partition reassignments, adding brokers or a
> > broker
> > > > coming back online after an extended period of time. In all these
> > > scenarios
> > > > it may be possible to distinguish bootstrap (effect) vs normal
> > > replication
> > > > - based on how far the replica has to catch up. I'm wondering if it
> is
> > > > enough to just set an umbrella "effect" replication quota with
> perhaps
> > > > per-topic overrides (say if some topics are more important than
> others)
> > > as
> > > > opposed to designating throttled replicas.
> > > >
> > > > Also, IIRC during client-side quota discussions we had considered the
> > > > possibility of allowing clients to go above their quotas when
> resources
> > > are
> > > > available. We ended up not doing that, but for replication throttling
> > it
> > > > may make sense - i.e., to treat the quota as a soft limit. Another
> way
> > to
> > > > look at it is instead of ensuring "effect replication traffic does
> not
> > > flow
> > > > faster than X bytes/sec" it may be useful to instead ensure that
> > "effect
> > > > replication traffic only flow

[GitHub] kafka pull request #1760: KAFKA-3937: Kafka Clients Leak Native Memory For L...

2016-08-18 Thread wiyu
GitHub user wiyu opened a pull request:

https://github.com/apache/kafka/pull/1760

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages

@ijuma - Making the change against trunk based on your suggestions to have 
the stream closing handled in the private RecordIterator constructor which I 
understand is only to be used only if the block of message(s) are compressed. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wiyu/kafka compressor_memory_leak_in_fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1760


commit 56451a38399875ae9d1ccdb2f873de88eae6f940
Author: William Yu 
Date:   2016-08-18T17:25:10Z

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3937) Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426851#comment-15426851
 ] 

ASF GitHub Bot commented on KAFKA-3937:
---

GitHub user wiyu opened a pull request:

https://github.com/apache/kafka/pull/1760

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages

@ijuma - Making the change against trunk based on your suggestions to have 
the stream closing handled in the private RecordIterator constructor which I 
understand is only to be used only if the block of message(s) are compressed. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wiyu/kafka compressor_memory_leak_in_fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1760


commit 56451a38399875ae9d1ccdb2f873de88eae6f940
Author: William Yu 
Date:   2016-08-18T17:25:10Z

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With 
Compressed Messages




> Kafka Clients Leak Native Memory For Longer Than Needed With Compressed 
> Messages
> 
>
> Key: KAFKA-3937
> URL: https://issues.apache.org/jira/browse/KAFKA-3937
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.2, 0.9.0.1, 0.10.0.0
> Environment: Linux, latest oracle java-8
>Reporter: Tom Crayford
>Priority: Minor
>
> In https://issues.apache.org/jira/browse/KAFKA-3933, we discovered that 
> brokers can crash when performing log recovery, as they leak native memory 
> whilst decompressing compressed segments, and that native memory isn't 
> cleaned up rapidly enough by garbage collection and finalizers. The work to 
> fix that in the brokers is taking part in 
> https://github.com/apache/kafka/pull/1598. As part of that PR, Ismael Juma 
> asked me to fix similar issues in the client. Rather than have one large PR 
> that fixes everything, I'd rather break this work up into seperate things, so 
> I'm filing this JIRA to track the followup work. I should get to a PR on this 
> at some point relatively soon, once the other PR has landed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Gwen Shapira
Just my take, since Jun and Ben originally wanted to solve a more
general approach and I talked them out of it :)

When we first add the feature, safety is probably most important in
getting people to adopt it - I wanted to make the feature very safe by
never throttling something admins don't want to throttle. So we
figured manual approach, while more challenging to configure, is the
safest. Admins usually know which replicas are "at risk" of taking
over and can choose to throttle them accordingly, they can build their
own integration with monitoring tools, etc.

It feels like any "smarts" we try and build into Kafka can be done
better with external tools that can watch both Kafka traffic (with the
new metrics) and things like network and CPU monitors.

We are open to a smarter approach in Kafka, but perhaps plan it for a
follow-up KIP? Maybe even after we have some experience with the
manual approach and how best to make throttling decisions.
Similar to what we do with choosing partitions to move around - we
started manually, admins are getting experience at how they like to
choose replicas and then we can bake their expertise into the product.

Gwen

On Thu, Aug 18, 2016 at 10:29 AM, Jun Rao  wrote:
> Joel,
>
> Yes, for your second comment. The tricky thing is still to figure out which
> replicas to throttle and by how much since in general, admins probably
> don't want already in-sync or close to in-sync replicas to be throttled. It
> would be great to get Todd's opinion on this. Could you ping him?
>
> Yes, we'd be happy to discuss auto-detection of effect traffic more offline.
>
> Thanks,
>
> Jun
>
> On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy  wrote:
>
>> > For your first comment. We thought about determining "effect" replicas
>> > automatically as well. First, there are some tricky stuff that one has to
>> >
>>
>> Auto-detection of effect traffic: i'm fairly certain it's doable but
>> definitely tricky. I'm also not sure it is something worth tackling at the
>> outset. If we want to spend more time thinking over it even if it's just an
>> academic exercise I would be happy to brainstorm offline.
>>
>>
>> > For your second comment, we discussed that in the client quotas design. A
>> > down side of that for client quotas is that a client may be surprised
>> that
>> > its traffic is not throttled at one time, but throttled as another with
>> the
>> > same quota (basically, less predicability). You can imaging setting a
>> quota
>> > for all replication traffic and only slow down the "effect" replicas if
>> > needed. The thought is more or less the same as the above. It requires
>> more
>> >
>>
>> For clients, this is true. I think this is much less of an issue for
>> server-side replication since the "users" here are the Kafka SREs who
>> generally know these internal details.
>>
>> I think it would be valuable to get some feedback from SREs on the proposal
>> before proceeding to a vote. (ping Todd)
>>
>> Joel
>>
>>
>> >
>> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford  wrote:
>> >
>> > > Hi Joel
>> > >
>> > > Ha! yes we had some similar thoughts, on both counts. Both are actually
>> > > good approaches, but come with some extra complexity.
>> > >
>> > > Segregating the replication type is tempting as it creates a more
>> general
>> > > solution. One issue is you need to draw a line between lagging and not
>> > > lagging. The ISR ‘limit' is a tempting divider, but has the side effect
>> > > that, once you drop out you get immediately throttled. Adding a
>> > > configurable divider is another option, but difficult for admins to
>> set,
>> > > and always a little arbitrary. A better idea is to prioritise, in
>> reverse
>> > > order to lag. But that also comes with additional complexity of its
>> own.
>> > >
>> > > Under throttling is also a tempting addition. That’s to say, if there’s
>> > > idle bandwidth lying around, not being used, why not use it to let
>> > lagging
>> > > brokers catch up. This involves some comparison to the maximum
>> bandwidth,
>> > > which could be configurable, or could be derived, with pros and cons
>> for
>> > > each.
>> > >
>> > > But the more general problem is actually quite hard to reason about, so
>> > > after some discussion we decided to settle on something simple, that we
>> > > felt we could get working, and extend to add these additional features
>> as
>> > > subsequent KIPs.
>> > >
>> > > I hope that seems reasonable. Jun may wish to add to this.
>> > >
>> > > B
>> > >
>> > >
>> > > > On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
>> > > >
>> > > > On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford 
>> > wrote:
>> > > >
>> > > >>
>> > > >> Let's us know if you have any further thoughts on KIP-73, else we'll
>> > > kick
>> > > >> off a vote.
>> > > >>
>> > > >
>> > > > I think the mechanism for throttling replicas looks good. Just had a
>> > few
>> > > > more thoughts on the configuration section. What you have looks
>> > > reasonable,
>> > > > but I was wondering if i

Should we have a KIP call?

2016-08-18 Thread Grant Henke
I am thinking it might be a good time to have a Kafka KIP call. There are a
lot of KIPs and discussions in progress that could benefit from a "quick"
call to discuss, coordinate, and prioritize.

Some of the voted topics we could discuss are:
(I didn't include ones that were just voted or will pass just before the
call)

   - KIP-33: Add a time based log index
   - KIP-50: Move Authorizer to o.a.k.common package
   - KIP-55: Secure Quotas for Authenticated Users
   - KIP-67: Queryable state for Kafka Streams
   - KIP-70: Revise Partition Assignment Semantics on New Consumer's
   Subscription Change

Some of the un-voted topics we could discuss are:

   - Time-based releases for Apache Kafka
   - Java 7 support timeline
   - KIP-4: ACL Admin Schema
   - KIP-37 - Add Namespaces to Kafka
   - KIP-48: Delegation token support for Kafka
   - KIP-54: Sticky Partition Assignment Strategy
   - KIP-63: Unify store and downstream caching in streams
   - KIP-66: Add Kafka Connect Transformers to allow transformations to
   messages
   - KIP-72 Allow Sizing Incoming Request Queue in Bytes
   - KIP-73: Replication Quotas
   - KIP-74: Add FetchResponse size limit in bytes

As a side note it may be worth moving some open KIPs to a "parked" list if
they are not being actively worked on. We can include a reason why as well.
Reasons could include being blocked, parked, dormant (no activity), or
abandoned (creator isn,t working on it and others can pick it up). We would
need to ask the KIP creator or define some length of time before we call a
KIP abandoned and available for pickup.

Some KIPs which may be candidates to be "parked" in a first pass are:

   - KIP-6 - New reassignment partition logic for rebalancing (dormant)
   - KIP-14 - Tools standardization (dormant)
   - KIP-17 - Add HighwaterMarkOffset to OffsetFetchResponse (dormant)
   - KIP-18 - JBOD Support (dormant)
   - KIP-23 - Add JSON/CSV output and looping options to
   ConsumerGroupCommand (dormant)
   - KIP-27 - Conditional Publish (dormant)
   - KIP-30 - Allow for brokers to have plug-able consensus and meta data
   storage sub systems (dormant)
   - KIP-39: Pinning controller to broker (dormant)
   - KIP-44 - Allow Kafka to have a customized security protocol (dormant)
   - KIP-46 - Self Healing (dormant)
   - KIP-47 - Add timestamp-based log deletion policy (blocked - by KIP-33)
   - KIP-53 - Add custom policies for reconnect attempts to NetworkdClient
   - KIP-58 - Make Log Compaction Point Configurable (blocked - by KIP-33)
   - KIP-61: Add a log retention parameter for maximum disk space usage
   percentage (dormant)
   - KIP-68 Add a consumed log retention before log retention (dormant)

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Mayuresh Gharat
Congrats Gwen :)

Thanks,

Mayuresh

On Thu, Aug 18, 2016 at 10:27 AM, Gwen Shapira  wrote:

> Thanks team Kafka :) Very excited and happy to contribute and be part
> of this fantastic community.
>
>
>
> On Thu, Aug 18, 2016 at 9:52 AM, Guozhang Wang  wrote:
> > Congrats Gwen!
> >
> > On Thu, Aug 18, 2016 at 9:27 AM, Ashish Singh 
> wrote:
> >
> >> Congrats Gwen!
> >>
> >> On Thursday, August 18, 2016, Grant Henke  wrote:
> >>
> >> > Congratulations Gwen!
> >> >
> >> >
> >> >
> >> > On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  >> > > wrote:
> >> >
> >> > > Congratulations Gwen! Great news.
> >> > >
> >> > > Ismael
> >> > >
> >> > > On 18 Aug 2016 2:44 am, "Jun Rao" >
> >> > wrote:
> >> > >
> >> > > > Hi, Everyone,
> >> > > >
> >> > > > Gwen Shapira has been active in the Kafka community since she
> became
> >> a
> >> > > > Kafka committer
> >> > > > about a year ago. I am glad to announce that Gwen is now a member
> of
> >> > > Kafka
> >> > > > PMC.
> >> > > >
> >> > > > Congratulations, Gwen!
> >> > > >
> >> > > > Jun
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Grant Henke
> >> > Software Engineer | Cloudera
> >> > gr...@cloudera.com  | twitter.com/gchenke |
> >> > linkedin.com/in/granthenke
> >> >
> >>
> >>
> >> --
> >> Ashish 🎤h
> >>
> >
> >
> >
> > --
> > -- Guozhang
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Todd Palino
Joel just reminded me to take another look at this one :) So first off,
this is great. It’s something that we definitely need to have, especially
as we get into the realm of moving partitions around more often.

I do prefer to have the cluster handle this automatically. What I envision
is a single configuration for “bootstrap replication quota” that is applied
when we have a replica that is in this situation. There’s 2 legitimate
cases that I’m aware of right now:
1 - We are moving a partition to a new replica. We know about this (at
least the controller does), so we should be able to apply the quota without
too much trouble here
2 - We have a broker that lost its disk and has to recover the partition
from the cluster. Harder to detect, but in this case, I’m not sure I even
want to throttle it because this is recovery activity.

The problem with this becomes the question of “why”. Why are you moving a
partition? Are you doing it because you want to balance traffic? Or are you
doing it because you lost a piece of hardware and you need to get the RF
for the partition back up to the desired level? As an admin, these have
different priorities. I may be perfectly fine with having the replication
traffic saturate the cluster in the latter case, because reliability and
availability is more important than performance.

Given the complexity of trying to determine intent, I’m going to agree with
implementing a manual procedure for now. We definitely need to have a
discussion about automating it as much as possible, but I think it’s part
of a larger conversation about how much automation should be built into the
broker itself, and how much should be part of a bolt-on “cluster manager”.
I’m not sure putting all that complexity into the broker is the right
choice.

I do agree with Joel here that while a hard quota is typically better from
a client point of view, in the case of replication traffic a soft quota is
appropriate, and desirable. Probably a combination of both, as I think we
still want a hard limit that stops short of saturating the entire cluster
with replication traffic.

-Todd


On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy  wrote:

> > For your first comment. We thought about determining "effect" replicas
> > automatically as well. First, there are some tricky stuff that one has to
> >
>
> Auto-detection of effect traffic: i'm fairly certain it's doable but
> definitely tricky. I'm also not sure it is something worth tackling at the
> outset. If we want to spend more time thinking over it even if it's just an
> academic exercise I would be happy to brainstorm offline.
>
>
> > For your second comment, we discussed that in the client quotas design. A
> > down side of that for client quotas is that a client may be surprised
> that
> > its traffic is not throttled at one time, but throttled as another with
> the
> > same quota (basically, less predicability). You can imaging setting a
> quota
> > for all replication traffic and only slow down the "effect" replicas if
> > needed. The thought is more or less the same as the above. It requires
> more
> >
>
> For clients, this is true. I think this is much less of an issue for
> server-side replication since the "users" here are the Kafka SREs who
> generally know these internal details.
>
> I think it would be valuable to get some feedback from SREs on the proposal
> before proceeding to a vote. (ping Todd)
>
> Joel
>
>
> >
> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford  wrote:
> >
> > > Hi Joel
> > >
> > > Ha! yes we had some similar thoughts, on both counts. Both are actually
> > > good approaches, but come with some extra complexity.
> > >
> > > Segregating the replication type is tempting as it creates a more
> general
> > > solution. One issue is you need to draw a line between lagging and not
> > > lagging. The ISR ‘limit' is a tempting divider, but has the side effect
> > > that, once you drop out you get immediately throttled. Adding a
> > > configurable divider is another option, but difficult for admins to
> set,
> > > and always a little arbitrary. A better idea is to prioritise, in
> reverse
> > > order to lag. But that also comes with additional complexity of its
> own.
> > >
> > > Under throttling is also a tempting addition. That’s to say, if there’s
> > > idle bandwidth lying around, not being used, why not use it to let
> > lagging
> > > brokers catch up. This involves some comparison to the maximum
> bandwidth,
> > > which could be configurable, or could be derived, with pros and cons
> for
> > > each.
> > >
> > > But the more general problem is actually quite hard to reason about, so
> > > after some discussion we decided to settle on something simple, that we
> > > felt we could get working, and extend to add these additional features
> as
> > > subsequent KIPs.
> > >
> > > I hope that seems reasonable. Jun may wish to add to this.
> > >
> > > B
> > >
> > >
> > > > On 18 Aug 2016, at 06:56, Joel Koshy  wrote:
> > > >
> > > > On Wed, Aug 17,

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Todd Palino
This all makes a lot of sense, and mirrors what I’m thinking as I finally
took some time to really walk through scenarios around why we move
partitions around.

What I’m wondering if it makes sense to have a conversation around breaking
out the controller entirely, separating it from the brokers, and starting
to add this intelligence into that. I don’t think anyone will disagree that
the controller needs a sizable amount of work. This definitely wouldn’t be
the first project to separate out the brains from the dumb worker processes.

-Todd


On Thu, Aug 18, 2016 at 10:53 AM, Gwen Shapira  wrote:

> Just my take, since Jun and Ben originally wanted to solve a more
> general approach and I talked them out of it :)
>
> When we first add the feature, safety is probably most important in
> getting people to adopt it - I wanted to make the feature very safe by
> never throttling something admins don't want to throttle. So we
> figured manual approach, while more challenging to configure, is the
> safest. Admins usually know which replicas are "at risk" of taking
> over and can choose to throttle them accordingly, they can build their
> own integration with monitoring tools, etc.
>
> It feels like any "smarts" we try and build into Kafka can be done
> better with external tools that can watch both Kafka traffic (with the
> new metrics) and things like network and CPU monitors.
>
> We are open to a smarter approach in Kafka, but perhaps plan it for a
> follow-up KIP? Maybe even after we have some experience with the
> manual approach and how best to make throttling decisions.
> Similar to what we do with choosing partitions to move around - we
> started manually, admins are getting experience at how they like to
> choose replicas and then we can bake their expertise into the product.
>
> Gwen
>
> On Thu, Aug 18, 2016 at 10:29 AM, Jun Rao  wrote:
> > Joel,
> >
> > Yes, for your second comment. The tricky thing is still to figure out
> which
> > replicas to throttle and by how much since in general, admins probably
> > don't want already in-sync or close to in-sync replicas to be throttled.
> It
> > would be great to get Todd's opinion on this. Could you ping him?
> >
> > Yes, we'd be happy to discuss auto-detection of effect traffic more
> offline.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy 
> wrote:
> >
> >> > For your first comment. We thought about determining "effect" replicas
> >> > automatically as well. First, there are some tricky stuff that one
> has to
> >> >
> >>
> >> Auto-detection of effect traffic: i'm fairly certain it's doable but
> >> definitely tricky. I'm also not sure it is something worth tackling at
> the
> >> outset. If we want to spend more time thinking over it even if it's
> just an
> >> academic exercise I would be happy to brainstorm offline.
> >>
> >>
> >> > For your second comment, we discussed that in the client quotas
> design. A
> >> > down side of that for client quotas is that a client may be surprised
> >> that
> >> > its traffic is not throttled at one time, but throttled as another
> with
> >> the
> >> > same quota (basically, less predicability). You can imaging setting a
> >> quota
> >> > for all replication traffic and only slow down the "effect" replicas
> if
> >> > needed. The thought is more or less the same as the above. It requires
> >> more
> >> >
> >>
> >> For clients, this is true. I think this is much less of an issue for
> >> server-side replication since the "users" here are the Kafka SREs who
> >> generally know these internal details.
> >>
> >> I think it would be valuable to get some feedback from SREs on the
> proposal
> >> before proceeding to a vote. (ping Todd)
> >>
> >> Joel
> >>
> >>
> >> >
> >> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford 
> wrote:
> >> >
> >> > > Hi Joel
> >> > >
> >> > > Ha! yes we had some similar thoughts, on both counts. Both are
> actually
> >> > > good approaches, but come with some extra complexity.
> >> > >
> >> > > Segregating the replication type is tempting as it creates a more
> >> general
> >> > > solution. One issue is you need to draw a line between lagging and
> not
> >> > > lagging. The ISR ‘limit' is a tempting divider, but has the side
> effect
> >> > > that, once you drop out you get immediately throttled. Adding a
> >> > > configurable divider is another option, but difficult for admins to
> >> set,
> >> > > and always a little arbitrary. A better idea is to prioritise, in
> >> reverse
> >> > > order to lag. But that also comes with additional complexity of its
> >> own.
> >> > >
> >> > > Under throttling is also a tempting addition. That’s to say, if
> there’s
> >> > > idle bandwidth lying around, not being used, why not use it to let
> >> > lagging
> >> > > brokers catch up. This involves some comparison to the maximum
> >> bandwidth,
> >> > > which could be configurable, or could be derived, with pros and cons
> >> for
> >> > > each.
> >> > >
> >> > > But the more general 

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Jun Rao
Todd,

Thanks for the detailed reply. So, it sounds like that you are ok with the
current proposal in the KIP for now and we can brainstorm on more automated
stuff separately? Are you comfortable with starting the vote on the current
proposal?

Jun

On Thu, Aug 18, 2016 at 11:00 AM, Todd Palino  wrote:

> Joel just reminded me to take another look at this one :) So first off,
> this is great. It’s something that we definitely need to have, especially
> as we get into the realm of moving partitions around more often.
>
> I do prefer to have the cluster handle this automatically. What I envision
> is a single configuration for “bootstrap replication quota” that is applied
> when we have a replica that is in this situation. There’s 2 legitimate
> cases that I’m aware of right now:
> 1 - We are moving a partition to a new replica. We know about this (at
> least the controller does), so we should be able to apply the quota without
> too much trouble here
> 2 - We have a broker that lost its disk and has to recover the partition
> from the cluster. Harder to detect, but in this case, I’m not sure I even
> want to throttle it because this is recovery activity.
>
> The problem with this becomes the question of “why”. Why are you moving a
> partition? Are you doing it because you want to balance traffic? Or are you
> doing it because you lost a piece of hardware and you need to get the RF
> for the partition back up to the desired level? As an admin, these have
> different priorities. I may be perfectly fine with having the replication
> traffic saturate the cluster in the latter case, because reliability and
> availability is more important than performance.
>
> Given the complexity of trying to determine intent, I’m going to agree with
> implementing a manual procedure for now. We definitely need to have a
> discussion about automating it as much as possible, but I think it’s part
> of a larger conversation about how much automation should be built into the
> broker itself, and how much should be part of a bolt-on “cluster manager”.
> I’m not sure putting all that complexity into the broker is the right
> choice.
>
> I do agree with Joel here that while a hard quota is typically better from
> a client point of view, in the case of replication traffic a soft quota is
> appropriate, and desirable. Probably a combination of both, as I think we
> still want a hard limit that stops short of saturating the entire cluster
> with replication traffic.
>
> -Todd
>
>
> On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy  wrote:
>
> > > For your first comment. We thought about determining "effect" replicas
> > > automatically as well. First, there are some tricky stuff that one has
> to
> > >
> >
> > Auto-detection of effect traffic: i'm fairly certain it's doable but
> > definitely tricky. I'm also not sure it is something worth tackling at
> the
> > outset. If we want to spend more time thinking over it even if it's just
> an
> > academic exercise I would be happy to brainstorm offline.
> >
> >
> > > For your second comment, we discussed that in the client quotas
> design. A
> > > down side of that for client quotas is that a client may be surprised
> > that
> > > its traffic is not throttled at one time, but throttled as another with
> > the
> > > same quota (basically, less predicability). You can imaging setting a
> > quota
> > > for all replication traffic and only slow down the "effect" replicas if
> > > needed. The thought is more or less the same as the above. It requires
> > more
> > >
> >
> > For clients, this is true. I think this is much less of an issue for
> > server-side replication since the "users" here are the Kafka SREs who
> > generally know these internal details.
> >
> > I think it would be valuable to get some feedback from SREs on the
> proposal
> > before proceeding to a vote. (ping Todd)
> >
> > Joel
> >
> >
> > >
> > > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford 
> wrote:
> > >
> > > > Hi Joel
> > > >
> > > > Ha! yes we had some similar thoughts, on both counts. Both are
> actually
> > > > good approaches, but come with some extra complexity.
> > > >
> > > > Segregating the replication type is tempting as it creates a more
> > general
> > > > solution. One issue is you need to draw a line between lagging and
> not
> > > > lagging. The ISR ‘limit' is a tempting divider, but has the side
> effect
> > > > that, once you drop out you get immediately throttled. Adding a
> > > > configurable divider is another option, but difficult for admins to
> > set,
> > > > and always a little arbitrary. A better idea is to prioritise, in
> > reverse
> > > > order to lag. But that also comes with additional complexity of its
> > own.
> > > >
> > > > Under throttling is also a tempting addition. That’s to say, if
> there’s
> > > > idle bandwidth lying around, not being used, why not use it to let
> > > lagging
> > > > brokers catch up. This involves some comparison to the maximum
> > bandwidth,
> > > > which could be con

Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Todd Palino
Yeah, I’m good with where we are right now on this KIP. It’s a workable
solution that we can add tooling to support. I would prefer to have soft
quotas, but given that it is more complex to implement, we can go with hard
quotas for the time being and consider it as an improvement later.

-Todd


On Thu, Aug 18, 2016 at 11:13 AM, Jun Rao  wrote:

> Todd,
>
> Thanks for the detailed reply. So, it sounds like that you are ok with the
> current proposal in the KIP for now and we can brainstorm on more automated
> stuff separately? Are you comfortable with starting the vote on the current
> proposal?
>
> Jun
>
> On Thu, Aug 18, 2016 at 11:00 AM, Todd Palino  wrote:
>
> > Joel just reminded me to take another look at this one :) So first off,
> > this is great. It’s something that we definitely need to have, especially
> > as we get into the realm of moving partitions around more often.
> >
> > I do prefer to have the cluster handle this automatically. What I
> envision
> > is a single configuration for “bootstrap replication quota” that is
> applied
> > when we have a replica that is in this situation. There’s 2 legitimate
> > cases that I’m aware of right now:
> > 1 - We are moving a partition to a new replica. We know about this (at
> > least the controller does), so we should be able to apply the quota
> without
> > too much trouble here
> > 2 - We have a broker that lost its disk and has to recover the partition
> > from the cluster. Harder to detect, but in this case, I’m not sure I even
> > want to throttle it because this is recovery activity.
> >
> > The problem with this becomes the question of “why”. Why are you moving a
> > partition? Are you doing it because you want to balance traffic? Or are
> you
> > doing it because you lost a piece of hardware and you need to get the RF
> > for the partition back up to the desired level? As an admin, these have
> > different priorities. I may be perfectly fine with having the replication
> > traffic saturate the cluster in the latter case, because reliability and
> > availability is more important than performance.
> >
> > Given the complexity of trying to determine intent, I’m going to agree
> with
> > implementing a manual procedure for now. We definitely need to have a
> > discussion about automating it as much as possible, but I think it’s part
> > of a larger conversation about how much automation should be built into
> the
> > broker itself, and how much should be part of a bolt-on “cluster
> manager”.
> > I’m not sure putting all that complexity into the broker is the right
> > choice.
> >
> > I do agree with Joel here that while a hard quota is typically better
> from
> > a client point of view, in the case of replication traffic a soft quota
> is
> > appropriate, and desirable. Probably a combination of both, as I think we
> > still want a hard limit that stops short of saturating the entire cluster
> > with replication traffic.
> >
> > -Todd
> >
> >
> > On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy 
> wrote:
> >
> > > > For your first comment. We thought about determining "effect"
> replicas
> > > > automatically as well. First, there are some tricky stuff that one
> has
> > to
> > > >
> > >
> > > Auto-detection of effect traffic: i'm fairly certain it's doable but
> > > definitely tricky. I'm also not sure it is something worth tackling at
> > the
> > > outset. If we want to spend more time thinking over it even if it's
> just
> > an
> > > academic exercise I would be happy to brainstorm offline.
> > >
> > >
> > > > For your second comment, we discussed that in the client quotas
> > design. A
> > > > down side of that for client quotas is that a client may be surprised
> > > that
> > > > its traffic is not throttled at one time, but throttled as another
> with
> > > the
> > > > same quota (basically, less predicability). You can imaging setting a
> > > quota
> > > > for all replication traffic and only slow down the "effect" replicas
> if
> > > > needed. The thought is more or less the same as the above. It
> requires
> > > more
> > > >
> > >
> > > For clients, this is true. I think this is much less of an issue for
> > > server-side replication since the "users" here are the Kafka SREs who
> > > generally know these internal details.
> > >
> > > I think it would be valuable to get some feedback from SREs on the
> > proposal
> > > before proceeding to a vote. (ping Todd)
> > >
> > > Joel
> > >
> > >
> > > >
> > > > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford 
> > wrote:
> > > >
> > > > > Hi Joel
> > > > >
> > > > > Ha! yes we had some similar thoughts, on both counts. Both are
> > actually
> > > > > good approaches, but come with some extra complexity.
> > > > >
> > > > > Segregating the replication type is tempting as it creates a more
> > > general
> > > > > solution. One issue is you need to draw a line between lagging and
> > not
> > > > > lagging. The ISR ‘limit' is a tempting divider, but has the side
> > effect
> > > > > that, once you drop o

Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Todd Palino
Congratulations, Gwen! Well deserved :)

-Todd


On Thu, Aug 18, 2016 at 10:27 AM, Gwen Shapira  wrote:

> Thanks team Kafka :) Very excited and happy to contribute and be part
> of this fantastic community.
>
>
>
> On Thu, Aug 18, 2016 at 9:52 AM, Guozhang Wang  wrote:
> > Congrats Gwen!
> >
> > On Thu, Aug 18, 2016 at 9:27 AM, Ashish Singh 
> wrote:
> >
> >> Congrats Gwen!
> >>
> >> On Thursday, August 18, 2016, Grant Henke  wrote:
> >>
> >> > Congratulations Gwen!
> >> >
> >> >
> >> >
> >> > On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  >> > > wrote:
> >> >
> >> > > Congratulations Gwen! Great news.
> >> > >
> >> > > Ismael
> >> > >
> >> > > On 18 Aug 2016 2:44 am, "Jun Rao" >
> >> > wrote:
> >> > >
> >> > > > Hi, Everyone,
> >> > > >
> >> > > > Gwen Shapira has been active in the Kafka community since she
> became
> >> a
> >> > > > Kafka committer
> >> > > > about a year ago. I am glad to announce that Gwen is now a member
> of
> >> > > Kafka
> >> > > > PMC.
> >> > > >
> >> > > > Congratulations, Gwen!
> >> > > >
> >> > > > Jun
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Grant Henke
> >> > Software Engineer | Cloudera
> >> > gr...@cloudera.com  | twitter.com/gchenke |
> >> > linkedin.com/in/granthenke
> >> >
> >>
> >>
> >> --
> >> Ashish 🎤h
> >>
> >
> >
> >
> > --
> > -- Guozhang
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>



-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino


Re: [VOTE] KIP-75 - Add per-connector Converters

2016-08-18 Thread Ewen Cheslack-Postava
Thanks everyone, this passes with 6 binding, 2 non-binding +1s.

-Ewen

On Mon, Aug 15, 2016 at 5:25 PM, Guozhang Wang  wrote:

> +1.
>
> On Mon, Aug 15, 2016 at 2:55 PM, Jason Gustafson 
> wrote:
>
> > +1 (non-binding)
> >
> > On Mon, Aug 15, 2016 at 2:53 PM, Gwen Shapira  wrote:
> >
> > > +1 (binding)
> > >
> > > On Mon, Aug 15, 2016 at 11:21 AM, Ewen Cheslack-Postava
> > >  wrote:
> > > > I would like to initiate the voting process for KIP-75:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 75+-+Add+per-connector+Converters
> > > >
> > > > I'll kick things off with a +1 (binding).
> > > >
> > > > --
> > > > Thanks,
> > > > Ewen
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
Thanks,
Ewen


Re: Should we have a KIP call?

2016-08-18 Thread Jun Rao
Grant,

That sounds like a good idea. I will send out an invite for this Tue at
11am. There are quite a few KIPs in your list and we probably can't cover
them all in one call. Perhaps we can do a quick status check on those that
have been voted
   - KIP-33: Add a time based log index
   - KIP-50: Move Authorizer to o.a.k.common package
   - KIP-55: Secure Quotas for Authenticated Users
   - KIP-67: Queryable state for Kafka Streams
   - KIP-70: Revise Partition Assignment Semantics on New Consumer's
Subscription
Change

and discuss some of the unvoted ones that are currently active and may need
more discussion?
   - Time-based releases for Apache Kafka
   - Java 7 support timeline
   - KIP-4: ACL Admin Schema
   - KIP-73: Replication Quotas
   - KIP-74: Add FetchResponse size limit in bytes

Thanks,

Jun

On Thu, Aug 18, 2016 at 10:55 AM, Grant Henke  wrote:

> I am thinking it might be a good time to have a Kafka KIP call. There are a
> lot of KIPs and discussions in progress that could benefit from a "quick"
> call to discuss, coordinate, and prioritize.
>
> Some of the voted topics we could discuss are:
> (I didn't include ones that were just voted or will pass just before the
> call)
>
>- KIP-33: Add a time based log index
>- KIP-50: Move Authorizer to o.a.k.common package
>- KIP-55: Secure Quotas for Authenticated Users
>- KIP-67: Queryable state for Kafka Streams
>- KIP-70: Revise Partition Assignment Semantics on New Consumer's
>Subscription Change
>
> Some of the un-voted topics we could discuss are:
>
>- Time-based releases for Apache Kafka
>- Java 7 support timeline
>- KIP-4: ACL Admin Schema
>- KIP-37 - Add Namespaces to Kafka
>- KIP-48: Delegation token support for Kafka
>- KIP-54: Sticky Partition Assignment Strategy
>- KIP-63: Unify store and downstream caching in streams
>- KIP-66: Add Kafka Connect Transformers to allow transformations to
>messages
>- KIP-72 Allow Sizing Incoming Request Queue in Bytes
>- KIP-73: Replication Quotas
>- KIP-74: Add FetchResponse size limit in bytes
>
> As a side note it may be worth moving some open KIPs to a "parked" list if
> they are not being actively worked on. We can include a reason why as well.
> Reasons could include being blocked, parked, dormant (no activity), or
> abandoned (creator isn,t working on it and others can pick it up). We would
> need to ask the KIP creator or define some length of time before we call a
> KIP abandoned and available for pickup.
>
> Some KIPs which may be candidates to be "parked" in a first pass are:
>
>- KIP-6 - New reassignment partition logic for rebalancing (dormant)
>- KIP-14 - Tools standardization (dormant)
>- KIP-17 - Add HighwaterMarkOffset to OffsetFetchResponse (dormant)
>- KIP-18 - JBOD Support (dormant)
>- KIP-23 - Add JSON/CSV output and looping options to
>ConsumerGroupCommand (dormant)
>- KIP-27 - Conditional Publish (dormant)
>- KIP-30 - Allow for brokers to have plug-able consensus and meta data
>storage sub systems (dormant)
>- KIP-39: Pinning controller to broker (dormant)
>- KIP-44 - Allow Kafka to have a customized security protocol (dormant)
>- KIP-46 - Self Healing (dormant)
>- KIP-47 - Add timestamp-based log deletion policy (blocked - by KIP-33)
>- KIP-53 - Add custom policies for reconnect attempts to NetworkdClient
>- KIP-58 - Make Log Compaction Point Configurable (blocked - by KIP-33)
>- KIP-61: Add a log retention parameter for maximum disk space usage
>percentage (dormant)
>- KIP-68 Add a consumed log retention before log retention (dormant)
>
> Thank you,
> Grant
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Harsha Chintalapani
Congrats Gwen.
-Harsha

On Thu, Aug 18, 2016 at 10:59 AM Mayuresh Gharat 
wrote:

> Congrats Gwen :)
>
> Thanks,
>
> Mayuresh
>
> On Thu, Aug 18, 2016 at 10:27 AM, Gwen Shapira  wrote:
>
> > Thanks team Kafka :) Very excited and happy to contribute and be part
> > of this fantastic community.
> >
> >
> >
> > On Thu, Aug 18, 2016 at 9:52 AM, Guozhang Wang 
> wrote:
> > > Congrats Gwen!
> > >
> > > On Thu, Aug 18, 2016 at 9:27 AM, Ashish Singh 
> > wrote:
> > >
> > >> Congrats Gwen!
> > >>
> > >> On Thursday, August 18, 2016, Grant Henke 
> wrote:
> > >>
> > >> > Congratulations Gwen!
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma  > >> > > wrote:
> > >> >
> > >> > > Congratulations Gwen! Great news.
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On 18 Aug 2016 2:44 am, "Jun Rao"  >
> > >> > wrote:
> > >> > >
> > >> > > > Hi, Everyone,
> > >> > > >
> > >> > > > Gwen Shapira has been active in the Kafka community since she
> > became
> > >> a
> > >> > > > Kafka committer
> > >> > > > about a year ago. I am glad to announce that Gwen is now a
> member
> > of
> > >> > > Kafka
> > >> > > > PMC.
> > >> > > >
> > >> > > > Congratulations, Gwen!
> > >> > > >
> > >> > > > Jun
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Grant Henke
> > >> > Software Engineer | Cloudera
> > >> > gr...@cloudera.com  | twitter.com/gchenke |
> > >> > linkedin.com/in/granthenke
> > >> >
> > >>
> > >>
> > >> --
> > >> Ashish 🎤h
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


Re: [DISCUSS] KIP-73: Replication Quotas

2016-08-18 Thread Gwen Shapira
Yes, I think its a great discussion to have. There are definitely pros
and cons to both approaches and worth thinking about the right way
forward.


On Thu, Aug 18, 2016 at 11:03 AM, Todd Palino  wrote:
> This all makes a lot of sense, and mirrors what I’m thinking as I finally
> took some time to really walk through scenarios around why we move
> partitions around.
>
> What I’m wondering if it makes sense to have a conversation around breaking
> out the controller entirely, separating it from the brokers, and starting
> to add this intelligence into that. I don’t think anyone will disagree that
> the controller needs a sizable amount of work. This definitely wouldn’t be
> the first project to separate out the brains from the dumb worker processes.
>
> -Todd
>
>
> On Thu, Aug 18, 2016 at 10:53 AM, Gwen Shapira  wrote:
>
>> Just my take, since Jun and Ben originally wanted to solve a more
>> general approach and I talked them out of it :)
>>
>> When we first add the feature, safety is probably most important in
>> getting people to adopt it - I wanted to make the feature very safe by
>> never throttling something admins don't want to throttle. So we
>> figured manual approach, while more challenging to configure, is the
>> safest. Admins usually know which replicas are "at risk" of taking
>> over and can choose to throttle them accordingly, they can build their
>> own integration with monitoring tools, etc.
>>
>> It feels like any "smarts" we try and build into Kafka can be done
>> better with external tools that can watch both Kafka traffic (with the
>> new metrics) and things like network and CPU monitors.
>>
>> We are open to a smarter approach in Kafka, but perhaps plan it for a
>> follow-up KIP? Maybe even after we have some experience with the
>> manual approach and how best to make throttling decisions.
>> Similar to what we do with choosing partitions to move around - we
>> started manually, admins are getting experience at how they like to
>> choose replicas and then we can bake their expertise into the product.
>>
>> Gwen
>>
>> On Thu, Aug 18, 2016 at 10:29 AM, Jun Rao  wrote:
>> > Joel,
>> >
>> > Yes, for your second comment. The tricky thing is still to figure out
>> which
>> > replicas to throttle and by how much since in general, admins probably
>> > don't want already in-sync or close to in-sync replicas to be throttled.
>> It
>> > would be great to get Todd's opinion on this. Could you ping him?
>> >
>> > Yes, we'd be happy to discuss auto-detection of effect traffic more
>> offline.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy 
>> wrote:
>> >
>> >> > For your first comment. We thought about determining "effect" replicas
>> >> > automatically as well. First, there are some tricky stuff that one
>> has to
>> >> >
>> >>
>> >> Auto-detection of effect traffic: i'm fairly certain it's doable but
>> >> definitely tricky. I'm also not sure it is something worth tackling at
>> the
>> >> outset. If we want to spend more time thinking over it even if it's
>> just an
>> >> academic exercise I would be happy to brainstorm offline.
>> >>
>> >>
>> >> > For your second comment, we discussed that in the client quotas
>> design. A
>> >> > down side of that for client quotas is that a client may be surprised
>> >> that
>> >> > its traffic is not throttled at one time, but throttled as another
>> with
>> >> the
>> >> > same quota (basically, less predicability). You can imaging setting a
>> >> quota
>> >> > for all replication traffic and only slow down the "effect" replicas
>> if
>> >> > needed. The thought is more or less the same as the above. It requires
>> >> more
>> >> >
>> >>
>> >> For clients, this is true. I think this is much less of an issue for
>> >> server-side replication since the "users" here are the Kafka SREs who
>> >> generally know these internal details.
>> >>
>> >> I think it would be valuable to get some feedback from SREs on the
>> proposal
>> >> before proceeding to a vote. (ping Todd)
>> >>
>> >> Joel
>> >>
>> >>
>> >> >
>> >> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford 
>> wrote:
>> >> >
>> >> > > Hi Joel
>> >> > >
>> >> > > Ha! yes we had some similar thoughts, on both counts. Both are
>> actually
>> >> > > good approaches, but come with some extra complexity.
>> >> > >
>> >> > > Segregating the replication type is tempting as it creates a more
>> >> general
>> >> > > solution. One issue is you need to draw a line between lagging and
>> not
>> >> > > lagging. The ISR ‘limit' is a tempting divider, but has the side
>> effect
>> >> > > that, once you drop out you get immediately throttled. Adding a
>> >> > > configurable divider is another option, but difficult for admins to
>> >> set,
>> >> > > and always a little arbitrary. A better idea is to prioritise, in
>> >> reverse
>> >> > > order to lag. But that also comes with additional complexity of its
>> >> own.
>> >> > >
>> >> > > Under throttling is also a tempting addition. That’s to sa

[jira] [Assigned] (KAFKA-3949) Consumer topic subscription change may be ignored if a rebalance is in progress

2016-08-18 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-3949:
--

Assignee: Jason Gustafson

> Consumer topic subscription change may be ignored if a rebalance is in 
> progress
> ---
>
> Key: KAFKA-3949
> URL: https://issues.apache.org/jira/browse/KAFKA-3949
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> The consumer's regex subscription works by matching all topics fetched from a 
> metadata update against the provided pattern. When a new topic is created or 
> an old topic is deleted, we update the list of subscribed topics and request 
> a rebalance by setting the {{needsPartitionAssignment}} flag inside 
> {{SubscriptionState}}. On the next call to {{poll()}}, the consumer will 
> observe the flag and begin the rebalance by sending a JoinGroup. The problem 
> is that it does not account for the fact that a rebalance could already be in 
> progress at the time the metadata is updated. This causes the following 
> sequence:
> 1. Rebalance begins (needsPartitionAssignment is set True)
> 2. Metadata max age expires and and update is triggered
> 3. Update returns and causes a topic subscription change 
> (needsPartitionAssignment set again to True).
> 4. Rebalance completes (needsPartitionAssignment is set False)
> In this situation, we will not request a new rebalance which will prevent us 
> from receiving an assignment from any topics added to the consumer's 
> subscription when the metadata was updated. This state will persist until 
> another event causes the group to rebalance.
> A related problem may occur if a rebalance is interrupted with the wakeup() 
> API, and the user calls subscribe(topics) with a change to the subscription 
> set. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


OffsetOutOfRange errors during broker leader transitions

2016-08-18 Thread Bill Warshaw
We are running a 3-node deployment of Kafka, and on several of our testing
sites we have seen the following scenario occur:

- "auto.offset.reset" is set to "earliest"
- A client is reading from Kafka, and at some point the broker throws an
OffsetOutOfRangeException, causing the consumer to seek to the beginning of
the partition.  The client consumer is nowhere near the end of the
partition at this point; there are thousands more messages after this
offset.
- At the same point in time, Kafka undergoes a leader transition

It seems like the partition leader is incorrectly determining the length of
the partition during this leader transition phase, which causes it to think
that a valid offset is past the end of the partition.  Is this a known
issue?

Log comparison:

client.log
2016-08-10 08:06:22,612 [Client] INFO
 org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 11891
is out of range, resetting offset

kafka server.log
2016-08-10 08:06:20,713 INFO New leader is 1001
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)


[jira] [Created] (KAFKA-4062) Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets

2016-08-18 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4062:
--

 Summary: Require --print-data-log if --offsets-decoder is enabled 
for DumpLogOffsets
 Key: KAFKA-4062
 URL: https://issues.apache.org/jira/browse/KAFKA-4062
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor


When using the DumpLogOffsets tool, if you want to print out contents of 
__consumer_offsets, you would typically use --offsets-decoder as an option.  
This option doesn't actually do much without --print-data-log enabled, so we 
should just require it to prevent user errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

2016-08-18 Thread Soumyajit Sahu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427096#comment-15427096
 ] 

Soumyajit Sahu commented on KAFKA-2170:
---

[~haraldk] Thanks for checking it out. I didn't try with compaction enabled. I 
will be able to dig more again after September 20th and get back to you.

At a glance, it looks like there must have been a previous error which left the 
original file as it is, and hence leading to this:
java.nio.file.FileAlreadyExistsException: 
C:\Users\hk\tmp\kafka-data\hktest-0\.index -> 
C:\Users\hk\tmp\kafka-data\hktest-0\.index.deleted

In the meanwhile, could you kindly take in all the 3 patches below, and give it 
another test run?
https://github.com/apache/kafka/pull/1757
https://github.com/apache/kafka/pull/1716
https://github.com/apache/kafka/pull/1718

Thanks!

> 10 LogTest cases failed for  file.renameTo failed under windows
> ---
>
> Key: KAFKA-2170
> URL: https://issues.apache.org/jira/browse/KAFKA-2170
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Windows
>Reporter: Honghai Chen
>Assignee: Jay Kreps
>
> get latest code from trunk, then run test 
> gradlew  -i core:test --tests kafka.log.LogTest
> Got 10 cases failed for same reason:
> kafka.common.KafkaStorageException: Failed to change the log file suffix from 
>  to .deleted for log segment 0
>   at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:259)
>   at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:756)
>   at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:747)
>   at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
>   at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at kafka.log.Log.deleteOldSegments(Log.scala:514)
>   at kafka.log.LogTest.testAsyncDelete(LogTest.scala:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
>   at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at $Proxy2.processTestC

Re: [VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-18 Thread Andrey L. Neporada
Hi all!
I’ve modified KIP-74 a little bit (as requested by Jason Gustafson & Jun Rao):
1) provided more detailed explanation on memory usage (no functional changes)
2) renamed “fetch.response.max.bytes” -> “fetch.max.bytes”

Let’s continue voting in this thread.

Thanks!
Andrey.

> On 17 Aug 2016, at 00:02, Jun Rao  wrote:
> 
> Andrey,
> 
> Thanks for the KIP. +1
> 
> Jun
> 
> On Tue, Aug 16, 2016 at 1:32 PM, Andrey L. Neporada <
> anepor...@yandex-team.ru> wrote:
> 
>> Hi!
>> 
>> I would like to initiate the voting process for KIP-74:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
>> 
>> 
>> Thanks,
>> Andrey.



[GitHub] kafka pull request #1733: KAFKA-4037: Make Connect REST API retries aware of...

2016-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1733


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4037) Transient failure in ConnectRestApiTest

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427284#comment-15427284
 ] 

ASF GitHub Bot commented on KAFKA-4037:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1733


> Transient failure in ConnectRestApiTest
> ---
>
> Key: KAFKA-4037
> URL: https://issues.apache.org/jira/browse/KAFKA-4037
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
> Fix For: 0.10.1.0, 0.10.0.2
>
>
> Transient failure in
> {code}
> Module: kafkatest.tests.connect_rest_test
> Class:  ConnectRestApiTest
> Method: test_rest_api
> {code}
> Failures found in (full list here: 
> http://testing.confluent.io/confluent-kafka-0-9-0-system-test-results/):
> 1/26, 2/1, 2/3, 2/6, 2/7
> Earliest known failure:
> 1/26/2016
> Top level error message in report.txt
> {code}
> test_id:
> 2016-01-26--001.kafkatest.tests.connect_rest_test.ConnectRestApiTest.test_rest_api
> status: FAIL
> run time:   2 minutes 0.132 seconds
> HTTPConnectionPool(host='172.31.27.101', port=8083): Max retries exceeded 
> with url: /connectors/local-file-source (Caused by 
> NewConnectionError(' object at 0x7fd3c9f6c650>: Failed to establish a new connection: [Errno 111] 
> Connection refused',))
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/tests/connect_rest_test.py",
>  line 141, in test_rest_api
> self.cc.delete_connector("local-file-source")
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/services/connect.py",
>  line 107, in delete_connector
> return self._rest('/connectors/' + name, node=node, method="DELETE")
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/services/connect.py",
>  line 115, in _rest
> resp = meth(url, json=body)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/api.py",
>  line 145, in delete
> return request('delete', url, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/api.py",
>  line 53, in request
> return session.request(method=method, url=url, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py",
>  line 468, in request
> resp = self.send(prep, **send_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py",
>  line 576, in send
> r = adapter.send(request, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/adapters.py",
>  line 437, in send
> raise ConnectionError(e, request=request)
> ConnectionError: HTTPConnectionPool(host='172.31.27.101', port=8083): Max 
> retries exceeded with url: /connectors/local-file-source (Caused by 
> NewConnectionError(' object at 0x7fd3c9f6c650>: Failed to establish a new connection: [Errno 111] 
> Connection refused',))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4037) Transient failure in ConnectRestApiTest

2016-08-18 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-4037:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   0.10.0.2
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1733
[https://github.com/apache/kafka/pull/1733]

> Transient failure in ConnectRestApiTest
> ---
>
> Key: KAFKA-4037
> URL: https://issues.apache.org/jira/browse/KAFKA-4037
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
> Fix For: 0.10.0.2, 0.10.1.0
>
>
> Transient failure in
> {code}
> Module: kafkatest.tests.connect_rest_test
> Class:  ConnectRestApiTest
> Method: test_rest_api
> {code}
> Failures found in (full list here: 
> http://testing.confluent.io/confluent-kafka-0-9-0-system-test-results/):
> 1/26, 2/1, 2/3, 2/6, 2/7
> Earliest known failure:
> 1/26/2016
> Top level error message in report.txt
> {code}
> test_id:
> 2016-01-26--001.kafkatest.tests.connect_rest_test.ConnectRestApiTest.test_rest_api
> status: FAIL
> run time:   2 minutes 0.132 seconds
> HTTPConnectionPool(host='172.31.27.101', port=8083): Max retries exceeded 
> with url: /connectors/local-file-source (Caused by 
> NewConnectionError(' object at 0x7fd3c9f6c650>: Failed to establish a new connection: [Errno 111] 
> Connection refused',))
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 101, in run_all_tests
> result.data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.3.8-py2.7.egg/ducktape/tests/runner.py",
>  line 151, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/tests/connect_rest_test.py",
>  line 141, in test_rest_api
> self.cc.delete_connector("local-file-source")
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/services/connect.py",
>  line 107, in delete_connector
> return self._rest('/connectors/' + name, node=node, method="DELETE")
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/tests/kafkatest/services/connect.py",
>  line 115, in _rest
> resp = meth(url, json=body)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/api.py",
>  line 145, in delete
> return request('delete', url, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/api.py",
>  line 53, in request
> return session.request(method=method, url=url, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py",
>  line 468, in request
> resp = self.send(prep, **send_kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py",
>  line 576, in send
> r = adapter.send(request, **kwargs)
>   File 
> "/var/lib/jenkins/workspace/kafka-0-9-0-0-system-test/kafka/venv/local/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/adapters.py",
>  line 437, in send
> raise ConnectionError(e, request=request)
> ConnectionError: HTTPConnectionPool(host='172.31.27.101', port=8083): Max 
> retries exceeded with url: /connectors/local-file-source (Caused by 
> NewConnectionError(' object at 0x7fd3c9f6c650>: Failed to establish a new connection: [Errno 111] 
> Connection refused',))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3438) Rack Aware Replica Reassignment should warn of overloaded brokers

2016-08-18 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427361#comment-15427361
 ] 

Vahid Hashemian commented on KAFKA-3438:


[~benstopford] Are you working on this? If not, I can take a look. Thanks.

> Rack Aware Replica Reassignment should warn of overloaded brokers
> -
>
> Key: KAFKA-3438
> URL: https://issues.apache.org/jira/browse/KAFKA-3438
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0
>Reporter: Ben Stopford
> Fix For: 0.10.1.0
>
>
> We've changed the replica reassignment code to be rack aware.
> One problem that might catch users out would be that they rebalance the 
> cluster using kafka-reassign-partitions.sh but their rack configuration means 
> that some high proportion of replicas are pushed onto a single, or small 
> number of, brokers. 
> This should be an easy problem to avoid, by changing the rack assignment 
> information, but we should probably warn users if they are going to create 
> something that is unbalanced. 
> So imagine I have a Kafka cluster of 12 nodes spread over two racks with rack 
> awareness enabled. If I add a 13th machine, on a new rack, and run the 
> rebalance tool, that new machine will get ~6x as many replicas as the least 
> loaded broker. 
> Suggest a warning  be added to the tool output when --generate is called. 
> "The most loaded broker has 2.3x as many replicas as the the least loaded 
> broker. This is likely due to an uneven distribution of brokers across racks. 
> You're advised to alter the rack config so there are approximately the same 
> number of brokers per rack" and displays the individual rack→#brokers and 
> broker→#replicas data for the proposed move.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427362#comment-15427362
 ] 

Shikhar Bhushan commented on KAFKA-3054:


Addressing this in KAFKA-4042, which should take care of remaining robustness 
issues in the {{DistributedHerder}} from bad connector or task configs.

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-3054.

Resolution: Done

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4064:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4064
 URL: https://issues.apache.org/jira/browse/KAFKA-4064
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4063:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4063
 URL: https://issues.apache.org/jira/browse/KAFKA-4063
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1761: KAFKA-4064 Add support for infinite endpoints for ...

2016-08-18 Thread theduderog
GitHub user theduderog opened a pull request:

https://github.com/apache/kafka/pull/1761

KAFKA-4064 Add support for infinite endpoints for range queries in Kafka 
Streams KV stores

@guozhangwang 

I had to fix the bug with in-memory ranges being exclusive and RocksDB 
being inclusive to get meaningful tests to work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/theduderog/kafka range-query

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1761


commit c286dab5c2869c96e514168d6f0de604813da78d
Author: Roger Hoover 
Date:   2016-08-18T23:12:22Z

First pass

commit 1b5671db15acea2cb6bf79a2cb31e101f46bd7d5
Author: Roger Hoover 
Date:   2016-08-18T23:29:54Z

Added unit tests

commit 92b6f2630909d46c21070c634b5f9efe8e9cc9c7
Author: Roger Hoover 
Date:   2016-08-18T23:44:39Z

Clean up




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427374#comment-15427374
 ] 

ASF GitHub Bot commented on KAFKA-4064:
---

GitHub user theduderog opened a pull request:

https://github.com/apache/kafka/pull/1761

KAFKA-4064 Add support for infinite endpoints for range queries in Kafka 
Streams KV stores

@guozhangwang 

I had to fix the bug with in-memory ranges being exclusive and RocksDB 
being inclusive to get meaningful tests to work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/theduderog/kafka range-query

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1761


commit c286dab5c2869c96e514168d6f0de604813da78d
Author: Roger Hoover 
Date:   2016-08-18T23:12:22Z

First pass

commit 1b5671db15acea2cb6bf79a2cb31e101f46bd7d5
Author: Roger Hoover 
Date:   2016-08-18T23:29:54Z

Added unit tests

commit 92b6f2630909d46c21070c634b5f9efe8e9cc9c7
Author: Roger Hoover 
Date:   2016-08-18T23:44:39Z

Clean up




> Add support for infinite endpoints for range queries in Kafka Streams KV 
> stores
> ---
>
> Key: KAFKA-4064
> URL: https://issues.apache.org/jira/browse/KAFKA-4064
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> In some applications, it's useful to iterate over the key-value store either:
> 1. from the beginning up to a certain key
> 2. from a certain key to the end
> We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #822

2016-08-18 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4037: Make Connect REST API retries aware of 409 CONFLICT errors

--
[...truncated 11997 lines...]
org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValuesWithName STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValuesWithName PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde STARTED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testKTable STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testKTable PASSED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testValueGetter 
STARTED

org.apache.kafka.streams.kstream.internals.KTableSourceTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream STARTED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream PASSED

org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testNotSendingOldValue STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableOuterJoinTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapValuesTest > 
testFlatMapValues STARTED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues STARTED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilterNot 
STARTED

org.apac

Build failed in Jenkins: kafka-trunk-jdk7 #1482

2016-08-18 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4037: Make Connect REST API retries aware of 409 CONFLICT errors

--
[...truncated 11979 lines...]
org.apache.kafka.streams.processor.internals.MinTimestampTrackerTest > 
testTracking PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testProcessOrder 
STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testProcessOrder 
PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testPauseResume 
STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > testPauseResume 
PASSED

org.apache.kafka.streams.processor.internals.StreamTaskTest > 
testMaybePunctuate STARTED

org.apache.kafka.streams.processor.internals.StreamTaskTest > 
testMaybePunctuate PASSED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
testEncodeDecode STARTED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
testEncodeDecode PASSED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
shouldDecodePreviousVersion STARTED

org.apache.kafka.streams.processor.internals.assignment.AssignmentInfoTest > 
shouldDecodePreviousVersion PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
testEncodeDecode STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
testEncodeDecode PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldEncodeDecodeWithUserEndPoint STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldEncodeDecodeWithUserEndPoint PASSED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldBeBackwardCompatible STARTED

org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfoTest > 
shouldBeBackwardCompatible PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testStickiness STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testStickiness PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithStandby STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithStandby PASSED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithoutStandby STARTED

org.apache.kafka.streams.processor.internals.assignment.TaskAssignorTest > 
testAssignWithoutStandby PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexingTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexingTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingStatefulTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingStatefulTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingSimpleTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingSimpleTopology PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testTopologyMetadata STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testTopologyMetadata PASSED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexByNameTopology STARTED

org.apache.kafka.streams.processor.internals.ProcessorTopologyTest > 
testDrivingMultiplexByNameTopology PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner PASSED

org.apache.kafka.streams.processor.internals.PunctuationQueueTest > 
testPunctuationInterval STARTED

org.apache.kafka.streams.processor.internals.PunctuationQueueTest > 
testPunctuationInterval PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldThrowExceptionIfApplicationServerConfigIsNotHostPortPair PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldMapUserEndPointToTopicPartitions PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldAddUserDefinedEndPointToSubscription STARTED

org.apache.kafka.streams.processor.internals.StreamPar

Build failed in Jenkins: kafka-0.10.0-jdk7 #190

2016-08-18 Thread Apache Jenkins Server
See 

Changes:

[me] KAFKA-4037: Make Connect REST API retries aware of 409 CONFLICT errors

--
[...truncated 6449 lines...]
org.apache.kafka.connect.runtime.WorkerSourceTaskTest > testCommit PASSED

org.apache.kafka.connect.runtime.WorkerSourceTaskTest > testCommitFailure PASSED

org.apache.kafka.connect.runtime.WorkerSourceTaskTest > 
testSendRecordsConvertsData PASSED

org.apache.kafka.connect.runtime.WorkerSourceTaskTest > testSendRecordsRetries 
PASSED

org.apache.kafka.connect.runtime.WorkerSourceTaskTest > 
testSendRecordsTaskCommitRecordFail PASSED

org.apache.kafka.connect.runtime.WorkerSourceTaskTest > testSlowTaskStart PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testPutConnectorConfig PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToOwner PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigAdded PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigUpdate PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPaused PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumed PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testUnknownConnectorPaused PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPausedRunningTaskOnly PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumedRunningTaskOnly PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testTaskConfigAdded PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinLeaderCatchUpFails PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testAccessors PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testInconsistentConfigs PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalanceFailedConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testHaltCleansUpWorker PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorAlreadyExists PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToOwner PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToLeader PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testPutConnectorConfig PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testPutTaskConfigs PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testAccessors PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testCreateConnectorAlreadyExists PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testDestroyConnector PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testRestartConnector PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
testRestartTask PASSED

org.apache.kafka.connect.runtime.standalone.StandaloneHerderTest > 
test

[jira] [Commented] (KAFKA-4051) Strange behavior during rebalance when turning the OS clock back

2016-08-18 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427554#comment-15427554
 ] 

Gwen Shapira commented on KAFKA-4051:
-

I was definitely concerned about complete replacement throughout the code. 
Especially when there was no justification in most places.

I think that going with option 1, doing performance comparisons and documenting 
exactly why we use nanotime there will be good.

> Strange behavior during rebalance when turning the OS clock back
> 
>
> Key: KAFKA-4051
> URL: https://issues.apache.org/jira/browse/KAFKA-4051
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.0
> Environment: OS: Ubuntu 14.04 - 64bits
>Reporter: Gabriel Ibarra
>Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1732: MINOR: Clarification in producer config documentat...

2016-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1732


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1740: MINOR: Remove # from .bat start script

2016-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1740


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1721: KAFKA-3845: KIP-75: Add per-connector converters

2016-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3845) Support per-connector converters

2016-08-18 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3845:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1721
[https://github.com/apache/kafka/pull/1721]

> Support per-connector converters
> 
>
> Key: KAFKA-3845
> URL: https://issues.apache.org/jira/browse/KAFKA-3845
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> While good for default configuration and reducing the total configuration the 
> user needs to do, it's inconvenient requiring that all connectors on a 
> cluster need to use the same converter. It's definitely a good idea to stay 
> consistent, but occasionally you may need a special converters, e.g. one 
> source of data happens to come in JSON despite you standardizing on Avro.
> Note that these configs are connector-level in the sense that the entire 
> connector should use a single converter type, but since converters are used 
> by tasks the config needs to be automatically propagated to tasks.
> This is effectively public API change as it is adding a built-in config for 
> connectors/tasks, so this probably requires a KIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3845) Support per-connector converters

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427571#comment-15427571
 ] 

ASF GitHub Bot commented on KAFKA-3845:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1721


> Support per-connector converters
> 
>
> Key: KAFKA-3845
> URL: https://issues.apache.org/jira/browse/KAFKA-3845
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> While good for default configuration and reducing the total configuration the 
> user needs to do, it's inconvenient requiring that all connectors on a 
> cluster need to use the same converter. It's definitely a good idea to stay 
> consistent, but occasionally you may need a special converters, e.g. one 
> source of data happens to come in JSON despite you standardizing on Avro.
> Note that these configs are connector-level in the sense that the entire 
> connector should use a single converter type, but since converters are used 
> by tasks the config needs to be automatically propagated to tasks.
> This is effectively public API change as it is adding a built-in config for 
> connectors/tasks, so this probably requires a KIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3949) Consumer topic subscription change may be ignored if a rebalance is in progress

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427604#comment-15427604
 ] 

ASF GitHub Bot commented on KAFKA-3949:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1762

KAFKA-3949: Fix race condition when metadata update arrives during rebalance



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3949

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1762


commit 2ce2e1f354740f07899d452e5f2e086cee759ebf
Author: Jason Gustafson 
Date:   2016-08-19T04:26:27Z

KAFKA-3949: Fix race condition when metadata update arrives during rebalance




> Consumer topic subscription change may be ignored if a rebalance is in 
> progress
> ---
>
> Key: KAFKA-3949
> URL: https://issues.apache.org/jira/browse/KAFKA-3949
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> The consumer's regex subscription works by matching all topics fetched from a 
> metadata update against the provided pattern. When a new topic is created or 
> an old topic is deleted, we update the list of subscribed topics and request 
> a rebalance by setting the {{needsPartitionAssignment}} flag inside 
> {{SubscriptionState}}. On the next call to {{poll()}}, the consumer will 
> observe the flag and begin the rebalance by sending a JoinGroup. The problem 
> is that it does not account for the fact that a rebalance could already be in 
> progress at the time the metadata is updated. This causes the following 
> sequence:
> 1. Rebalance begins (needsPartitionAssignment is set True)
> 2. Metadata max age expires and and update is triggered
> 3. Update returns and causes a topic subscription change 
> (needsPartitionAssignment set again to True).
> 4. Rebalance completes (needsPartitionAssignment is set False)
> In this situation, we will not request a new rebalance which will prevent us 
> from receiving an assignment from any topics added to the consumer's 
> subscription when the metadata was updated. This state will persist until 
> another event causes the group to rebalance.
> A related problem may occur if a rebalance is interrupted with the wakeup() 
> API, and the user calls subscribe(topics) with a change to the subscription 
> set. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1762: KAFKA-3949: Fix race condition when metadata updat...

2016-08-18 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1762

KAFKA-3949: Fix race condition when metadata update arrives during rebalance



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3949

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1762


commit 2ce2e1f354740f07899d452e5f2e086cee759ebf
Author: Jason Gustafson 
Date:   2016-08-19T04:26:27Z

KAFKA-3949: Fix race condition when metadata update arrives during rebalance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy resolved KAFKA-4063.

Resolution: Duplicate

> Add support for infinite endpoints for range queries in Kafka Streams KV 
> stores
> ---
>
> Key: KAFKA-4063
> URL: https://issues.apache.org/jira/browse/KAFKA-4063
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> In some applications, it's useful to iterate over the key-value store either:
> 1. from the beginning up to a certain key
> 2. from a certain key to the end
> We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-18 Thread Jason Gustafson
+1 (non-binding)

Thanks Andrey!

On Thu, Aug 18, 2016 at 1:47 PM, Andrey L. Neporada <
anepor...@yandex-team.ru> wrote:

> Hi all!
> I’ve modified KIP-74 a little bit (as requested by Jason Gustafson & Jun
> Rao):
> 1) provided more detailed explanation on memory usage (no functional
> changes)
> 2) renamed “fetch.response.max.bytes” -> “fetch.max.bytes”
>
> Let’s continue voting in this thread.
>
> Thanks!
> Andrey.
>
> > On 17 Aug 2016, at 00:02, Jun Rao  wrote:
> >
> > Andrey,
> >
> > Thanks for the KIP. +1
> >
> > Jun
> >
> > On Tue, Aug 16, 2016 at 1:32 PM, Andrey L. Neporada <
> > anepor...@yandex-team.ru> wrote:
> >
> >> Hi!
> >>
> >> I would like to initiate the voting process for KIP-74:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
> >>
> >>
> >> Thanks,
> >> Andrey.
>
>


Re: [VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-18 Thread Gwen Shapira
+1 (binding)

On Thu, Aug 18, 2016 at 1:47 PM, Andrey L. Neporada
 wrote:
> Hi all!
> I’ve modified KIP-74 a little bit (as requested by Jason Gustafson & Jun Rao):
> 1) provided more detailed explanation on memory usage (no functional changes)
> 2) renamed “fetch.response.max.bytes” -> “fetch.max.bytes”
>
> Let’s continue voting in this thread.
>
> Thanks!
> Andrey.
>
>> On 17 Aug 2016, at 00:02, Jun Rao  wrote:
>>
>> Andrey,
>>
>> Thanks for the KIP. +1
>>
>> Jun
>>
>> On Tue, Aug 16, 2016 at 1:32 PM, Andrey L. Neporada <
>> anepor...@yandex-team.ru> wrote:
>>
>>> Hi!
>>>
>>> I would like to initiate the voting process for KIP-74:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
>>>
>>>
>>> Thanks,
>>> Andrey.
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [VOTE] KIP-74: Add FetchResponse size limit in bytes

2016-08-18 Thread Manikumar Reddy
+1 (non-binding)

This feature help us control memory footprint and allows consumer to
progress on fetching  large messages.

On Fri, Aug 19, 2016 at 10:32 AM, Gwen Shapira  wrote:

> +1 (binding)
>
> On Thu, Aug 18, 2016 at 1:47 PM, Andrey L. Neporada
>  wrote:
> > Hi all!
> > I’ve modified KIP-74 a little bit (as requested by Jason Gustafson & Jun
> Rao):
> > 1) provided more detailed explanation on memory usage (no functional
> changes)
> > 2) renamed “fetch.response.max.bytes” -> “fetch.max.bytes”
> >
> > Let’s continue voting in this thread.
> >
> > Thanks!
> > Andrey.
> >
> >> On 17 Aug 2016, at 00:02, Jun Rao  wrote:
> >>
> >> Andrey,
> >>
> >> Thanks for the KIP. +1
> >>
> >> Jun
> >>
> >> On Tue, Aug 16, 2016 at 1:32 PM, Andrey L. Neporada <
> >> anepor...@yandex-team.ru> wrote:
> >>
> >>> Hi!
> >>>
> >>> I would like to initiate the voting process for KIP-74:
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 74%3A+Add+Fetch+Response+Size+Limit+in+Bytes
> >>>
> >>>
> >>> Thanks,
> >>> Andrey.
> >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


[jira] [Resolved] (KAFKA-3204) ConsumerConnector blocked on Authenticated by SASL Failed.

2016-08-18 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy resolved KAFKA-3204.

Resolution: Won't Fix

> ConsumerConnector blocked on Authenticated by SASL Failed.
> --
>
> Key: KAFKA-3204
> URL: https://issues.apache.org/jira/browse/KAFKA-3204
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.0
>Reporter: shenyuan wang
>
> We've set up Kafka to use ZK authentication and test authentication 
> failures.Test program has been blocked, and Repeated retry connection zk.The 
> log repeated like this:
> 2016-02-04 17:24:47 INFO  FourLetterWordMain:46 - connecting to 10.75.202.42 
> 24002
> 2016-02-04 17:24:47 INFO  ClientCnxn:1211 - Opening socket connection to 
> server 10.75.202.42/10.75.202.42:24002. Will not attempt to authenticate 
> using SASL (unknown error)
> 2016-02-04 17:24:47 INFO  ClientCnxn:981 - Socket connection established, 
> initiating session, client: /10.61.22.215:56060, server: 
> 10.75.202.42/10.75.202.42:24002
> 2016-02-04 17:24:47 INFO  ClientCnxn:1472 - Session establishment complete on 
> server 10.75.202.42/10.75.202.42:24002, sessionid = 0xd013ed38238d1aa, 
> negotiated timeout = 4000
> 2016-02-04 17:24:47 INFO  ZkClient:711 - zookeeper state changed 
> (SyncConnected)
> 2016-02-04 17:24:47 INFO  ClientCnxn:1326 - Unable to read additional data 
> from server sessionid 0xd013ed38238d1aa, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2016-02-04 17:24:47 INFO  ZkClient:711 - zookeeper state changed 
> (Disconnected)
> 2016-02-04 17:24:47 INFO  ZkClient:934 - Waiting for keeper state 
> SyncConnected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-18 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-18 Thread shikhar
GitHub user shikhar reopened a pull request:

https://github.com/apache/kafka/pull/1745

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`, or any other exception arising 
from the constructor of the connector or task class when we `newInstance()`), 
so add catch blocks around those calls from the `DistributedHerder` and handle 
by invoking `onFailure()` which updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 4bb02e610b01d7b425f5c39b435d4d7484b89ee9
Author: Shikhar Bhushan 
Date:   2016-08-17T23:29:30Z

KAFKA-4042: prevent `DistributedHerder` thread from dying from 
connector/task lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`), so add catch blocks around those 
calls from the `DistributedHerder` and handle by invoking `onFailure()` which 
updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---