date:20170216

Jenkins build is back to normal : kafka-trunk-jdk8 #1277

2017-02-16 Thread Apache Jenkins Server

See

[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869703#comment-15869703
 ] 

Vladimír Kleštinec commented on KAFKA-2729:
---

[~elevy] Agree, we are experiencing same issue, this is real blocker and we are 
loosing trust in Kafka...

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KAFKA-4770) KStreamAggregationDedupIntegrationTest fails occasionally

2017-02-16 Thread Eno Thereska (JIRA)

Eno Thereska created KAFKA-4770:
---

 Summary: KStreamAggregationDedupIntegrationTest fails occasionally
 Key: KAFKA-4770
 URL: https://issues.apache.org/jira/browse/KAFKA-4770
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
 Fix For: 0.10.3.0


org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldGroupByKey FAILED
java.lang.AssertionError: 
Expected: is <[KeyValue(1@1487244179000, 2), KeyValue(2@1487244179000, 2), 
KeyValue(3@1487244179000, 2), KeyValue(4@1487244179000, 2), 
KeyValue(5@1487244179000, 2)]>
 but: was <[KeyValue(1@1487244179000, 2), KeyValue(2@1487244179000, 2), 
KeyValue(3@1487244179000, 2), KeyValue(4@1487244179000, 1), 
KeyValue(5@1487244179000, 1)]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
at 
org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest.shouldGroupByKey(KStreamAggregationDedupIntegrationTest.java:240)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] kafka pull request #2555: Javadoc typo

2017-02-16 Thread astubbs

GitHub user astubbs opened a pull request:

https://github.com/apache/kafka/pull/2555

Javadoc typo



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/astubbs/kafka patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2555


commit 616d93d191bd1193d927a68ba08fc9489e70484c
Author: Antony Stubbs 
Date:   2017-02-16T12:00:35Z

Javadoc typo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869912#comment-15869912
 ] 

Buğra Gedik commented on KAFKA-4767:


I think we are getting close to understanding each other. Here is my problem:

I have code running in a thread, which calls KafkaProducer.close(). If the 
thread happens to get an interrupt while it is inside ``KafkaProducer.close()`` 
it ends up leaking the Kafka IO thread. This happens because join() inside 
close() ends up getting inerrupted. 

The way we join threads in our Java code is as follows:
{code}
try {
myThread.join();
} catch(final InterruptedException e) {
   // some call to tell myThread to finish up immediately
   myThread.interrupt(); // it could be some other call for your case
   try {
myThread.join(); 
   } catch(final InterruptedException e) {
// could be something else, such as logging in your case
throw new MyCustomException(..., e);
   } finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
catch (final )
{code}


> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869912#comment-15869912
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 1:25 PM:
-

I think we are getting close to understanding each other. Here is my problem:

I have code running in a thread, which calls {{KafkaProducer.close()}}. If the 
thread happens to get an interrupt while it is inside 
{{KafkaProducer.close()}}, it ends up leaking the Kafka IO thread. This happens 
because {{join()}} inside {{close()}} ends up getting interrupted. 

This problem is not unique to Kafka. To handle cases, like this, the way we 
join threads in our Java code is as follows:
{code}
try {
myThread.join();
} catch(final InterruptedException e) {
   // some call to tell myThread to finish up immediately
   myThread.interrupt(); // it could be some other call for your case
   try {
myThread.join(); 
   } catch(final InterruptedException e) {
// could be something else, such as logging in your case
throw new MyCustomException(..., e);
   } finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
catch (final )
{code}



was (Author: bgedik):
I think we are getting close to understanding each other. Here is my problem:

I have code running in a thread, which calls KafkaProducer.close(). If the 
thread happens to get an interrupt while it is inside ``KafkaProducer.close()`` 
it ends up leaking the Kafka IO thread. This happens because join() inside 
close() ends up getting inerrupted. 

The way we join threads in our Java code is as follows:
{code}
try {
myThread.join();
} catch(final InterruptedException e) {
   // some call to tell myThread to finish up immediately
   myThread.interrupt(); // it could be some other call for your case
   try {
myThread.join(); 
   } catch(final InterruptedException e) {
// could be something else, such as logging in your case
throw new MyCustomException(..., e);
   } finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
catch (final )
{code}


> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869912#comment-15869912
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 1:25 PM:
-

I think we are getting close to understanding each other. Here is my problem:

I have code running in a thread, which calls {{KafkaProducer.close()}}. If the 
thread happens to get an interrupt while it is inside 
{{KafkaProducer.close()}}, it ends up leaking the Kafka IO thread. This happens 
because {{join()}} inside {{close()}} ends up getting interrupted. 

This problem is not unique to Kafka. To handle cases, like this, the way we 
join threads in our Java code is as follows:
{code}
try {
myThread.join();
} catch(final InterruptedException e) {
   // some call to tell myThread to finish up immediately
   myThread.interrupt(); // it could be some other call for your case
   try {
myThread.join(); 
   } catch(final InterruptedException e) {
// could be something else, such as logging in your case
throw new MyCustomException(..., e);
   } finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}



was (Author: bgedik):
I think we are getting close to understanding each other. Here is my problem:

I have code running in a thread, which calls {{KafkaProducer.close()}}. If the 
thread happens to get an interrupt while it is inside 
{{KafkaProducer.close()}}, it ends up leaking the Kafka IO thread. This happens 
because {{join()}} inside {{close()}} ends up getting interrupted. 

This problem is not unique to Kafka. To handle cases, like this, the way we 
join threads in our Java code is as follows:
{code}
try {
myThread.join();
} catch(final InterruptedException e) {
   // some call to tell myThread to finish up immediately
   myThread.interrupt(); // it could be some other call for your case
   try {
myThread.join(); 
   } catch(final InterruptedException e) {
// could be something else, such as logging in your case
throw new MyCustomException(..., e);
   } finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
catch (final )
{code}


> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [kafka-clients] [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Rajini Sivaram

+1 (non-binding)

Ran quick start and some security tests on binary, checked source build and
tests.

Thank you,

Rajini

On Thu, Feb 16, 2017 at 2:04 AM, Jun Rao  wrote:

> Hi, Ewen,
>
> Thanks for running the release. +1. Verified quickstart on 2.10 binary.
>
> Jun
>
> On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava  >
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 0.10.2.0.
> >
> > This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> > See the release notes and release plan (https://cwiki.apache.org/conf
> > luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> > feature highlights: SASL-SCRAM support, improved client compatibility to
> > allow use of clients newer than the broker, session windows and global
> > tables in the Kafka Streams API, single message transforms in the Kafka
> > Connect framework.
> >
> > Important note: in addition to the artifacts generated using JDK7 for
> > Scala 2.10 and 2.11, this release also includes experimental artifacts
> > built using JDK8 for Scala 2.12.
> >
> > Important code changes since RC1 (non-docs, non system tests):
> >
> > KAFKA-4756; The auto-generated broker id should be passed to
> > MetricReporter.configure
> > KAFKA-4761; Fix producer regression handling small or zero batch size
> >
> > Release notes for the 0.10.2.0 release:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by February 17th 5pm ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > 5712b489038b71ed8d5a679856d1dfaa925eadc1
> >
> >
> > * Documentation:
> > http://kafka.apache.org/0102/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/0102/protocol.html
> >
> > * Successful Jenkins builds for the 0.10.2 branch:
> > Unit/integration tests: https://builds.apache.org/job/
> > kafka-0.10.2-jdk7/77/
> > System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/
> > 29/
> >
> > /**
> >
> > Thanks,
> > Ewen
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to kafka-clients+unsubscr...@googlegroups.com.
> > To post to this group, send email to kafka-clie...@googlegroups.com.
> > Visit this group at https://groups.google.com/group/kafka-clients.
> > To view this discussion on the web visit https://groups.google.com/d/
> > msgid/kafka-clients/CAE1jLMORScgr1RekNgY0fLykSPh_%
> > 2BgkRYN7vok3fz1ou%3DuW3kw%40mail.gmail.com
> >  CAE1jLMORScgr1RekNgY0fLykSPh_%2BgkRYN7vok3fz1ou%3DuW3kw%
> 40mail.gmail.com?utm_medium=email&utm_source=footer>
> > .
> > For more options, visit https://groups.google.com/d/optout.
> >
>

Re: [kafka-clients] [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Tom Crayford

+1 (non-binding)

I didn't explicitly state my voting status in my previous comment, sorry.

On Thu, Feb 16, 2017 at 1:59 PM, Rajini Sivaram 
wrote:

> +1 (non-binding)
>
> Ran quick start and some security tests on binary, checked source build and
> tests.
>
> Thank you,
>
> Rajini
>
> On Thu, Feb 16, 2017 at 2:04 AM, Jun Rao  wrote:
>
> > Hi, Ewen,
> >
> > Thanks for running the release. +1. Verified quickstart on 2.10 binary.
> >
> > Jun
> >
> > On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 0.10.2.0.
> > >
> > > This is a minor version release of Apache Kafka. It includes 19 new
> KIPs.
> > > See the release notes and release plan (https://cwiki.apache.org/conf
> > > luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> > > feature highlights: SASL-SCRAM support, improved client compatibility
> to
> > > allow use of clients newer than the broker, session windows and global
> > > tables in the Kafka Streams API, single message transforms in the Kafka
> > > Connect framework.
> > >
> > > Important note: in addition to the artifacts generated using JDK7 for
> > > Scala 2.10 and 2.11, this release also includes experimental artifacts
> > > built using JDK8 for Scala 2.12.
> > >
> > > Important code changes since RC1 (non-docs, non system tests):
> > >
> > > KAFKA-4756; The auto-generated broker id should be passed to
> > > MetricReporter.configure
> > > KAFKA-4761; Fix producer regression handling small or zero batch size
> > >
> > > Release notes for the 0.10.2.0 release:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by February 17th 5pm ***
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > > 5712b489038b71ed8d5a679856d1dfaa925eadc1
> > >
> > >
> > > * Documentation:
> > > http://kafka.apache.org/0102/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/0102/protocol.html
> > >
> > > * Successful Jenkins builds for the 0.10.2 branch:
> > > Unit/integration tests: https://builds.apache.org/job/
> > > kafka-0.10.2-jdk7/77/
> > > System tests: https://jenkins.confluent.io/
> job/system-test-kafka-0.10.2/
> > > 29/
> > >
> > > /**
> > >
> > > Thanks,
> > > Ewen
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to kafka-clients+unsubscr...@googlegroups.com.
> > > To post to this group, send email to kafka-clie...@googlegroups.com.
> > > Visit this group at https://groups.google.com/group/kafka-clients.
> > > To view this discussion on the web visit https://groups.google.com/d/
> > > msgid/kafka-clients/CAE1jLMORScgr1RekNgY0fLykSPh_%
> > > 2BgkRYN7vok3fz1ou%3DuW3kw%40mail.gmail.com
> > >  > CAE1jLMORScgr1RekNgY0fLykSPh_%2BgkRYN7vok3fz1ou%3DuW3kw%
> > 40mail.gmail.com?utm_medium=email&utm_source=footer>
> > > .
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
>

[jira] [Updated] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)

2017-02-16 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4765:
---
   Resolution: Fixed
Fix Version/s: 0.10.3.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2552
[https://github.com/apache/kafka/pull/2552]

> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
> Fix For: 0.10.3.0
>
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] kafka pull request #2552: KAFKA-4765: Fixed Intentionally Broken Hosts Resol...

2017-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2552


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread huxi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870055#comment-15870055
 ] 

huxi commented on KAFKA-4767:
-

What do you mean by "leaking the IO thread"? Do you mean it could not be shut 
down successfully after interrupting the user thread in which 
KafkaProducer.close was invoked?  This should be not gonna happen since 
this.sender.initiateClose() would always be run even when you interrupt the 
user thread. 

In my opinion, interrupting the user thread is no different from invoking 
ioThread.join with a relatively small timeout because there is still a chance 
to force close the IO thread and wait it again. That's also why we swallow 
InterruptedException during the first join. 

Does it look good to you though? And for sake of the curiosity, did you 
encounter any cases where IO thread got failed to be shut down?

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870051#comment-15870051
 ] 

ASF GitHub Bot commented on KAFKA-4765:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2552


> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
> Fix For: 0.10.3.0
>
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Michael Pearce

Hi Jason,

On point 1) in the message protocol the headers are simply a byte array, as 
like the key or value, this is to clearly demarcate the header in the core 
message. Then the header byte array in the core message is an array of key, 
value pairs. This is what it is denoting.

Then this would be I guess in the given notation:

Headers => [KeyLength, Key, ValueLength, Value]
KeyLength => int32 <-NEW size of the byte[] of the 
serialised key value
Key => bytes <-- NEW serialised string (UTF8) bytes of 
the header key
ValueLength => int32 <-- NEW size of the byte[] of the 
serialised header value
Value => bytes < NEW serialised form of the header value

The key length and value length is matching the way the protocol is defined in 
the core message currently.




On point 2)
Var sized ints, this was discussed much earlier on, in fact I had suggested it 
myself (with Hadoop references), the complexity of this compared to having a 
simpler protocol was argued and agreed it wasn’t worth the complexity as all 
other clients in other languages would need to ensure theyre using the right 
var size algorithm, as there is a few.

On point 3)
We did the attributes, optional approach as originally there was marked concern 
that headers would cause a message size overhead for others, who don’t want 
them. As such this is the clean solution to achieve that. If that no longer 
holds, and we don’t care that we add 4bytes overhead, then im happy to remove.

I’m personally in favour of keeping the message as small as possible so people 
don’t get shocks in perf and throughputs dues to message size, unless they 
actively use the feature, as such I do prefer the attribute bit wise feature 
flag approach myself.




On 16/02/2017, 05:40, "Jason Gustafson"  wrote:

We have proposed a few significant changes to the message format in KIP-98
which now seems likely to pass (perhaps with some iterations on
implementation details). It would be good to try and coordinate the changes
in both of the proposals to make sure they are consistent and compatible.

I think using the attributes to indicate null headers is a reasonable
approach. We have proposed to do the same thing for the message key and
value. That said, I sympathize with Jay's argument. Having multiple ways to
specify a null value increases the overall complexity of the protocol. You
can see this just from the fact that you need the extra verbiage in the
protocol specification in this KIP and in KIP-98 to describe the dependence
between the fields and the attributes. It seems like a slippery slope if
you start allowing different request types to implement the protocol
specification differently.

You can also argue that the messages already are and are likely to remain a
special case. For example, there is currently no generality in how
compressed message sets are represented that would be applicable for other
request types. Some might see this divergence as an unfortunate protocol
deficiency which should be fixed; others might see it as sort of the
inevitability of needing to optimize where it counts most. I'm probably
somewhere in between, but I think we probably all share the intuition that
the protocol should be kept as consistent as possible. With that in mind,
here are a few comments:

1. One thing I found a little odd when reading the current proposal is that
the headers are both represented as an array of bytes and as an array of
key/value pairs. I'd probably suggest something like this:

Headers => [HeaderKey HeaderValue]
 HeaderKey => String
 HeaderValue => Bytes

An array in the Kafka protocol is represented as a 4-byte integer
indicating the number of elements in the array followed by the
serialization of the elements. Unless I'm misunderstanding, what you have
instead is the total size of the headers in bytes followed by the elements.
I'm not sure I see any reason for this inconsistency.

2. In KIP-98, we've introduced variable-length integer fields. Effectively,
we've enriched (or "complicated" as Jay might say ;) the protocol
specification to include the following types: VarInt, VarLong,
UnsignedVarInt and UnsignedVarLong.

Along with these primitives, we could introduce the following types:

VarSizeArray => NumberOfItems Item1 Item2 .. ItemN
  NumberOfItems => UnsignedVarInt

VarSizeNullableArray => NumberOfItemsOrNull Item1 Item2 .. ItemN
  NumberOfItemsOrNull => VarInt (-1 means null)

And similarly for the `String` and `Bytes` types. These types can save a
considerable amount of space in this proposal because they can be used for
both the number of headers included in the message and the lengths of the
header keys and values. We could do this instead:

Headers => VarSizeArray[HeaderKey

[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870121#comment-15870121
 ] 

Buğra Gedik commented on KAFKA-4767:


I understand that it will eventually gets shutdown. But that does not cut it 
for us. We would like the IO thread to be shutdown after {{close}} returns. And 
that does not happen if we get an interrupt during close(). Yes, eventually it 
will go away. However, various software such as Tomcat, has thread leak 
detectors and they are turned on by this behavior. Once join is interrupted, we 
have no way of 'waiting' until the IO thread goes away.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870121#comment-15870121
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 3:15 PM:
-

I understand that it will eventually go away. But that does not cut it for us. 
We would like the IO thread to be shutdown after {{close}} returns. And that 
does not happen if we get an interrupt during close(). Yes, eventually it will 
go away. However, various software such as Tomcat, has thread leak detectors 
and they are turned on by this behavior. Once join is interrupted, we have no 
way of 'waiting' until the IO thread goes away.


was (Author: bgedik):
I understand that it will eventually gets shutdown. But that does not cut it 
for us. We would like the IO thread to be shutdown after {{close}} returns. And 
that does not happen if we get an interrupt during close(). Yes, eventually it 
will go away. However, various software such as Tomcat, has thread leak 
detectors and they are turned on by this behavior. Once join is interrupted, we 
have no way of 'waiting' until the IO thread goes away.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870121#comment-15870121
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 3:16 PM:
-

I understand that it will eventually go away. But that does not cut it for us. 
We would like the IO thread to be shutdown after {{close}} returns. And that 
does not happen if we get an interrupt during close(). Yes, eventually it will 
go away. However, various software such as Tomcat, has thread leak detectors 
and they are turned on by this behavior. Once join is interrupted, we (client 
code that uses KafkaProducer) have no way of 'waiting' until the IO thread goes 
away.


was (Author: bgedik):
I understand that it will eventually go away. But that does not cut it for us. 
We would like the IO thread to be shutdown after {{close}} returns. And that 
does not happen if we get an interrupt during close(). Yes, eventually it will 
go away. However, various software such as Tomcat, has thread leak detectors 
and they are turned on by this behavior. Once join is interrupted, we have no 
way of 'waiting' until the IO thread goes away.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KAFKA-4771) NPEs in KerberosLogin due to autoboxing

2017-02-16 Thread Colm O hEigeartaigh (JIRA)

Colm O hEigeartaigh created KAFKA-4771:
--

 Summary: NPEs in KerberosLogin due to autoboxing
 Key: KAFKA-4771
 URL: https://issues.apache.org/jira/browse/KAFKA-4771
 Project: Kafka
  Issue Type: Bug
Reporter: Colm O hEigeartaigh
Priority: Trivial


There are a bunch of NullPointerExceptions possible in KerberosLogin due to 
autoboxing if the configuration does not have specified values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Apache Kafka Docker official image

2017-02-16 Thread Gianluca Privitera

Hi, 
I’m currently proposing an official image for Apache Kafka in the Docker 
library ( https://github.com/docker-library/official-images/pull/2627 
 ).
I wanted to know if someone from Kafka upstream is interested in taking over or 
you are ok with me being the maintainer of the image.

Let me know so I can speed up the process of the image approval. 

Thanks

Gianluca Privitera

[GitHub] kafka pull request #2556: KAFKA-4771 - NPEs in KerberosLogin due to autoboxi...

2017-02-16 Thread coheigea

GitHub user coheigea opened a pull request:

https://github.com/apache/kafka/pull/2556

KAFKA-4771 - NPEs in KerberosLogin due to autoboxing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/coheigea/kafka kafka-4771

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2556


commit 857294aefd8378f09eb2ca99547ac2c5b0b75aa9
Author: Colm O hEigeartaigh 
Date:   2017-02-16T16:38:17Z

KAFKA-4771 - NPEs in KerberosLogin due to autoboxing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4771) NPEs in KerberosLogin due to autoboxing

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870246#comment-15870246
 ] 

ASF GitHub Bot commented on KAFKA-4771:
---

GitHub user coheigea opened a pull request:

https://github.com/apache/kafka/pull/2556

KAFKA-4771 - NPEs in KerberosLogin due to autoboxing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/coheigea/kafka kafka-4771

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2556


commit 857294aefd8378f09eb2ca99547ac2c5b0b75aa9
Author: Colm O hEigeartaigh 
Date:   2017-02-16T16:38:17Z

KAFKA-4771 - NPEs in KerberosLogin due to autoboxing




> NPEs in KerberosLogin due to autoboxing
> ---
>
> Key: KAFKA-4771
> URL: https://issues.apache.org/jira/browse/KAFKA-4771
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> There are a bunch of NullPointerExceptions possible in KerberosLogin due to 
> autoboxing if the configuration does not have specified values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2017-02-16 Thread Stephen Durfey (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Durfey updated KAFKA-4159:
--
Status: Patch Available  (was: Open)

> Allow overriding producer & consumer properties at the connector level
> --
>
> Key: KAFKA-4159
> URL: https://issues.apache.org/jira/browse/KAFKA-4159
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Stephen Durfey
>
> As an example use cases, overriding a sink connector's consumer's partition 
> assignment strategy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2017-02-16 Thread Stephen Durfey (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Durfey reassigned KAFKA-4159:
-

Assignee: Stephen Durfey

> Allow overriding producer & consumer properties at the connector level
> --
>
> Key: KAFKA-4159
> URL: https://issues.apache.org/jira/browse/KAFKA-4159
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Stephen Durfey
>
> As an example use cases, overriding a sink connector's consumer's partition 
> assignment strategy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KAFKA-4772) Exploit #peek to implement #print() and other methods

2017-02-16 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created KAFKA-4772:
--

 Summary: Exploit #peek to implement #print() and other methods
 Key: KAFKA-4772
 URL: https://issues.apache.org/jira/browse/KAFKA-4772
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax
Priority: Minor


From: https://github.com/apache/kafka/pull/2493#pullrequestreview-22157555

Things that I can think of:

- print / writeAsTest can be a special impl of peek; KStreamPrint etc can be 
removed.
- consider collapse KStreamPeek with KStreamForeach with a flag parameter 
indicating if the acted key-value pair should still be forwarded.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Gwen Shapira

+1 (binding).

Verified signatures, ran unit tests, ran quickstart.

Nice release :)

On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava
 wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few feature
> highlights: SASL-SCRAM support, improved client compatibility to allow use
> of clients newer than the broker, session windows and global tables in the
> Kafka Streams API, single message transforms in the Kafka Connect framework.
>
> Important note: in addition to the artifacts generated using JDK7 for Scala
> 2.10 and 2.11, this release also includes experimental artifacts built
> using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29/
>
> /**
>
> Thanks,
> Ewen



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Neha Narkhede

+1 (binding)

Verified signatures, quickstart, docs.

Thanks for running the release, Ewen!

On Thu, Feb 16, 2017 at 9:42 AM Gwen Shapira  wrote:

> +1 (binding).
>
> Verified signatures, ran unit tests, ran quickstart.
>
> Nice release :)
>
> On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava
>  wrote:
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 0.10.2.0.
> >
> > This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> > See the release notes and release plan (https://cwiki.apache.org/conf
> > luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature
> > highlights: SASL-SCRAM support, improved client compatibility to allow
> use
> > of clients newer than the broker, session windows and global tables in
> the
> > Kafka Streams API, single message transforms in the Kafka Connect
> framework.
> >
> > Important note: in addition to the artifacts generated using JDK7 for
> Scala
> > 2.10 and 2.11, this release also includes experimental artifacts built
> > using JDK8 for Scala 2.12.
> >
> > Important code changes since RC1 (non-docs, non system tests):
> >
> > KAFKA-4756; The auto-generated broker id should be passed to
> > MetricReporter.configure
> > KAFKA-4761; Fix producer regression handling small or zero batch size
> >
> > Release notes for the 0.10.2.0 release:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by February 17th 5pm ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=5712b489038b71ed8d5a679856d1dfaa925eadc1
> >
> >
> > * Documentation:
> > http://kafka.apache.org/0102/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/0102/protocol.html
> >
> > * Successful Jenkins builds for the 0.10.2 branch:
> > Unit/integration tests:
> https://builds.apache.org/job/kafka-0.10.2-jdk7/77/
> > System tests:
> https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29/
> >
> > /**
> >
> > Thanks,
> > Ewen
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 <(650)%20450-2760> | @gwenshap
> Follow us: Twitter | blog
>
-- 
Thanks,
Neha

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Jason Gustafson

Hey Michael,

Hmm, I guess the point of representing it as bytes is to allow the broker
to pass it through opaquely? Is the cost of parsing them a concern, or are
we simply trying to ensure that the broker stays agnostic to the format?

On varints, I think adding support for them makes less sense for an
isolated use case, but as part of a more holistic change (such as what we
have proposed in KIP-98), I think they are justifiable. If we add them,
then the need to use attributes becomes quite a bit weaker, right? The
other thing I find slightly odd is the fact that null headers has no actual
semantic meaning for the message (unlike null keys and values). It is just
a space optimization. It seems a bit better to always use size 0 to
indicate having no headers.

Overall, the main point is ensuring that the message schema remains
consistent, either within the larger protocol, or at a minimum within the
message itself.

-Jason

On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce 
wrote:

> Hi Jason,
>
> On point 1) in the message protocol the headers are simply a byte array,
> as like the key or value, this is to clearly demarcate the header in the
> core message. Then the header byte array in the core message is an array of
> key, value pairs. This is what it is denoting.
>
> Then this would be I guess in the given notation:
>
> Headers => [KeyLength, Key, ValueLength, Value]
> KeyLength => int32 <-NEW size of the byte[] of the
> serialised key value
> Key => bytes <-- NEW serialised string (UTF8)
> bytes of the header key
> ValueLength => int32 <-- NEW size of the byte[] of the
> serialised header value
> Value => bytes < NEW serialised form of the header
> value
>
> The key length and value length is matching the way the protocol is
> defined in the core message currently.
>
>
>
>
> On point 2)
> Var sized ints, this was discussed much earlier on, in fact I had
> suggested it myself (with Hadoop references), the complexity of this
> compared to having a simpler protocol was argued and agreed it wasn’t worth
> the complexity as all other clients in other languages would need to ensure
> theyre using the right var size algorithm, as there is a few.
>
> On point 3)
> We did the attributes, optional approach as originally there was marked
> concern that headers would cause a message size overhead for others, who
> don’t want them. As such this is the clean solution to achieve that. If
> that no longer holds, and we don’t care that we add 4bytes overhead, then
> im happy to remove.
>
> I’m personally in favour of keeping the message as small as possible so
> people don’t get shocks in perf and throughputs dues to message size,
> unless they actively use the feature, as such I do prefer the attribute bit
> wise feature flag approach myself.
>
>
>
>
> On 16/02/2017, 05:40, "Jason Gustafson"  wrote:
>
> We have proposed a few significant changes to the message format in
> KIP-98
> which now seems likely to pass (perhaps with some iterations on
> implementation details). It would be good to try and coordinate the
> changes
> in both of the proposals to make sure they are consistent and
> compatible.
>
> I think using the attributes to indicate null headers is a reasonable
> approach. We have proposed to do the same thing for the message key and
> value. That said, I sympathize with Jay's argument. Having multiple
> ways to
> specify a null value increases the overall complexity of the protocol.
> You
> can see this just from the fact that you need the extra verbiage in the
> protocol specification in this KIP and in KIP-98 to describe the
> dependence
> between the fields and the attributes. It seems like a slippery slope
> if
> you start allowing different request types to implement the protocol
> specification differently.
>
> You can also argue that the messages already are and are likely to
> remain a
> special case. For example, there is currently no generality in how
> compressed message sets are represented that would be applicable for
> other
> request types. Some might see this divergence as an unfortunate
> protocol
> deficiency which should be fixed; others might see it as sort of the
> inevitability of needing to optimize where it counts most. I'm probably
> somewhere in between, but I think we probably all share the intuition
> that
> the protocol should be kept as consistent as possible. With that in
> mind,
> here are a few comments:
>
> 1. One thing I found a little odd when reading the current proposal is
> that
> the headers are both represented as an array of bytes and as an array
> of
> key/value pairs. I'd probably suggest something like this:
>
> Headers => [HeaderKey HeaderValue]
>  HeaderKey => String
>  HeaderValue => Bytes
>
> An array in the Kafka protocol is represented as a 4-byte integer
> i

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-16 Thread Colin McCabe

On Mon, Feb 13, 2017, at 23:04, radai wrote:
> 1. making the client Closeable/AutoCloseable would allow try (Client =
> ...)
> {} without the need to finally close.

Good idea... let's make the interface extend AutoCloseable.

> 
> 2. a "stream processing unit" (producer + consumer) currently holds 2
> open
> sockets to every broker it interacts with, because producer and consumer
> dont share the network stack. if we use the admin API to auto cleanup on
> commit for intermediate pipelines (which is one of our use cases) this
> figure goes up to 3 sockets per unit of processing per broker. beyond
> becoming a scalability issue this (i think) might also introduce annoying
> bugs due to some (but not all) of these connections being down. this is
> not
> an issue of this KIP though.

Right, that is out of scope for this KIP, which is just about public
APIs.

It's worth thinking about for the future, though.

best,
Colin

> 
> On Mon, Feb 13, 2017 at 11:51 AM, Colin McCabe 
> wrote:
> 
> > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > > Hey Colin,
> > >
> > > Thanks for the hard work on this. I know going back and forth on APIs is
> > > kind of frustrating but we're at the point where these things live long
> > > enough and are used by enough people that it is worth the pain. I'm sure
> > > it'll come down in the right place eventually. A couple things I've found
> > > helped in the past:
> > >
> > >1. The burden of evidence needs to fall on the complicator. i.e. if
> > >person X thinks the api should be async they need to produce a set of
> > >common use cases that require this. Otherwise you are perpetually
> > >having to
> > >think "we might need x". I think it is good to have a rule of "simple
> > >until
> > >proven insufficient".
> > >2. Make sure we frame things for the intended audience. At this point
> > >our apis get used by a very broad set of Java engineers. This is a
> > >very
> > >different audience from our developer mailing list. These people code
> > >for a
> > >living not necessarily as a passion, and may not understand details of
> > >the
> > >internals of our system or even basic things like multi-threaded
> > >programming. I don't think this means we want to dumb things down, but
> > >rather try really hard to make things truly simple when possible.
> > >
> > > Okay here were a couple of comments:
> > >
> > >1. Conceptually what is a TopicContext? I think it means something
> > >like
> > >TopicAdmin? It is not literally context about Topics right? What is
> > >the
> > >relationship of Contexts to clients? Is there a threadsafety
> > >difference?
> > >Would be nice to not have to think about this, this is what I mean by
> > >"conceptual weight". We introduce a new concept that is a bit nebulous
> > >that
> > >I have to figure out to use what could be a simple api. I'm sure
> > >you've
> > >been through this experience before where you have these various
> > >objects
> > >and you're trying to figure out what they represent (the connection to
> > >the
> > >server? the information to create a connection? a request session?).
> >
> > The intention was to provide some grouping of methods, and also a place
> > to put request parameters which were often set to defaults rather than
> > being explicitly set.  If it seems complex, we can certainly get rid of
> > it.
> >
> > >2. We've tried to avoid the Impl naming convention. In general the
> > >rule
> > >has been if there is only going to be one implementation you don't
> > >need an
> > >interface. If there will be multiple, distinguish it from the others.
> > >The
> > >other clients follow this pattern: Producer, KafkaProducer,
> > >MockProducer;
> > >Consumer, KafkaConsumer, MockConsumer.
> >
> > Good point.  Let's change the interface to KafkaAdminClient, and the
> > implementation to AdminClient.
> >
> > >3. We generally don't use setters or getters as a naming convention. I
> > >personally think mutating the setting in place seems kind of like late
> > >90s
> > >Java style. I think it likely has thread-safety issues. i.e. even if
> > >it is
> > >volatile you may not get the value you just set if there is another
> > >thread... I actually really liked what you described as your original
> > >idea
> > >of having a single parameter object like CreateTopicRequest that holds
> > >all
> > >these parameters and defaults. This lets you evolve the api with all
> > >the
> > >various combinations of arguments without overloading insanity. After
> > >doing
> > >literally tens of thousands of remote APIs at LinkedIn we eventually
> > >converged on a rule, which is ultimately every remote api needs a
> > >single
> > >argument object you can add to over time and it must be batched. Which
> > >brings me to my next

Re: [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Sriram Subramanian

+1
Verified signatures, tests and docs

On Thu, Feb 16, 2017 at 9:44 AM, Neha Narkhede  wrote:

> +1 (binding)
>
> Verified signatures, quickstart, docs.
>
> Thanks for running the release, Ewen!
>
> On Thu, Feb 16, 2017 at 9:42 AM Gwen Shapira  wrote:
>
> > +1 (binding).
> >
> > Verified signatures, ran unit tests, ran quickstart.
> >
> > Nice release :)
> >
> > On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava
> >  wrote:
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 0.10.2.0.
> > >
> > > This is a minor version release of Apache Kafka. It includes 19 new
> KIPs.
> > > See the release notes and release plan (https://cwiki.apache.org/conf
> > > luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> > feature
> > > highlights: SASL-SCRAM support, improved client compatibility to allow
> > use
> > > of clients newer than the broker, session windows and global tables in
> > the
> > > Kafka Streams API, single message transforms in the Kafka Connect
> > framework.
> > >
> > > Important note: in addition to the artifacts generated using JDK7 for
> > Scala
> > > 2.10 and 2.11, this release also includes experimental artifacts built
> > > using JDK8 for Scala 2.12.
> > >
> > > Important code changes since RC1 (non-docs, non system tests):
> > >
> > > KAFKA-4756; The auto-generated broker id should be passed to
> > > MetricReporter.configure
> > > KAFKA-4761; Fix producer regression handling small or zero batch size
> > >
> > > Release notes for the 0.10.2.0 release:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by February 17th 5pm ***
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> > >
> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
> > >
> > >
> > > * Documentation:
> > > http://kafka.apache.org/0102/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/0102/protocol.html
> > >
> > > * Successful Jenkins builds for the 0.10.2 branch:
> > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-0.10.2-jdk7/77/
> > > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29/
> > >
> > > /**
> > >
> > > Thanks,
> > > Ewen
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > Follow us: Twitter | blog
> >
> --
> Thanks,
> Neha
>

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-16 Thread Colin McCabe

Hi all,

So I think people have made some very good points so far.  There seems
to be agreement that we need to have explicit batch APIs for the sake of
efficiency, so I added that back.

Contexts seem a little more complex than we thought, so I removed that
from the proposal.

I removed the Impl class.  Instead, we now have a KafkaAdminClient
interface and an AdminClient implementation.  I think this matches our
other code better, as Jay commented.

Each call now has an "Options" object that is passed in.  This will
allow us to easily add new parameters to the calls without having tons
of function overloads.  Similarly, each call now has a Results object,
which will let us easily extend the results we are returning if needed.

Many people made the point that Java 7 Futures are not that useful, but
Java 8 CompletableFutures are.  With CompletableFutures, you can chain
calls, adapt them, join them-- basically all the stuff people are doing
in Node.js and Twisted Python.  Java 7 Futures don't really let you do
anything but poll for a value or block.  So I felt that it was better to
just go with a CompletableFuture-based API.

People also made the point that they would like an easy API for waiting
on complete success of a batch call.  For example, an API that would
fail if even one topic wasn't created in createTopics.  So I came up
with Result objects that provide multiple futures that you can wait on. 
You can wait on a future that fires when everything is complete, or you
can wait on futures for individual topics being created.

I updated the wiki, so please take a look.  Note that this new API
requires JDK8.  It seems like JDK8 is coming soon, though, and the
disadvantages of sticking to Java 7 are pretty big here, I think.

best,
Colin

On Mon, Feb 13, 2017, at 11:51, Colin McCabe wrote:
> On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > Hey Colin,
> > 
> > Thanks for the hard work on this. I know going back and forth on APIs is
> > kind of frustrating but we're at the point where these things live long
> > enough and are used by enough people that it is worth the pain. I'm sure
> > it'll come down in the right place eventually. A couple things I've found
> > helped in the past:
> > 
> >1. The burden of evidence needs to fall on the complicator. i.e. if
> >person X thinks the api should be async they need to produce a set of
> >common use cases that require this. Otherwise you are perpetually
> >having to
> >think "we might need x". I think it is good to have a rule of "simple
> >until
> >proven insufficient".
> >2. Make sure we frame things for the intended audience. At this point
> >our apis get used by a very broad set of Java engineers. This is a
> >very
> >different audience from our developer mailing list. These people code
> >for a
> >living not necessarily as a passion, and may not understand details of
> >the
> >internals of our system or even basic things like multi-threaded
> >programming. I don't think this means we want to dumb things down, but
> >rather try really hard to make things truly simple when possible.
> > 
> > Okay here were a couple of comments:
> > 
> >1. Conceptually what is a TopicContext? I think it means something
> >like
> >TopicAdmin? It is not literally context about Topics right? What is
> >the
> >relationship of Contexts to clients? Is there a threadsafety
> >difference?
> >Would be nice to not have to think about this, this is what I mean by
> >"conceptual weight". We introduce a new concept that is a bit nebulous
> >that
> >I have to figure out to use what could be a simple api. I'm sure
> >you've
> >been through this experience before where you have these various
> >objects
> >and you're trying to figure out what they represent (the connection to
> >the
> >server? the information to create a connection? a request session?).
> 
> The intention was to provide some grouping of methods, and also a place
> to put request parameters which were often set to defaults rather than
> being explicitly set.  If it seems complex, we can certainly get rid of
> it.
> 
> >2. We've tried to avoid the Impl naming convention. In general the
> >rule
> >has been if there is only going to be one implementation you don't
> >need an
> >interface. If there will be multiple, distinguish it from the others.
> >The
> >other clients follow this pattern: Producer, KafkaProducer,
> >MockProducer;
> >Consumer, KafkaConsumer, MockConsumer.
> 
> Good point.  Let's change the interface to KafkaAdminClient, and the
> implementation to AdminClient.
> 
> >3. We generally don't use setters or getters as a naming convention. I
> >personally think mutating the setting in place seems kind of like late
> >90s
> >Java style. I think it likely has thread-safety issues. i.e. even if
> >it is
> >volatile you may not get

[jira] [Commented] (KAFKA-4686) Null Message payload is shutting down broker

2017-02-16 Thread Hannu Valtonen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870447#comment-15870447
 ] 

Hannu Valtonen commented on KAFKA-4686:
---

We've periodically seen the same on Kafka 0.10.1.1 with the kafka-python 
client. The messages in the topics in question have been compressed with snappy.

> Null Message payload is shutting down broker
> 
>
> Key: KAFKA-4686
> URL: https://issues.apache.org/jira/browse/KAFKA-4686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Amazon Linux AMI release 2016.03 kernel 
> 4.4.19-29.55.amzn1.x86_64
>Reporter: Rodrigo Queiroz Saramago
> Fix For: 0.10.3.0
>
> Attachments: KAFKA-4686-NullMessagePayloadError.tar.xz, 
> kafkaServer.out
>
>
> Hello, I have a test environment with 3 brokers and 1 zookeeper nodes, in 
> which clients connect using two-way ssl authentication. I use kafka version 
> 0.10.1.1, the system works as expected for a while, but if the broker goes 
> down and then is restarted, something got corrupted and is not possible start 
> broker again, it always fails with the same error. What this error mean? What 
> can I do in this case? Is this the expected behavior?
> [2017-01-23 07:03:28,927] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.KafkaException: Message payload is null: 
> Message(magic = 0, attributes = 1, crc = 4122289508, key = null, payload = 
> null) (kafka.log.LogManager)
> [2017-01-23 07:03:28,929] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-23 07:03:28,946] INFO shutting down (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,949] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> [2017-01-23 07:03:28,954] INFO EventThread shut down for session: 
> 0x159bd458ae70008 (org.apache.zookeeper.ClientCnxn)
> [2017-01-23 07:03:28,954] INFO Session: 0x159bd458ae70008 closed 
> (org.apache.zookeeper.ZooKeeper)
> [2017-01-23 07:03:28,957] INFO shut down completed (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,959] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(Trav

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

2017-02-16 Thread Matthias J. Sax

Jeyhun,

can you please add the KIP to this table:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion

and to this list:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams

Thanks!


-Matthias

On 2/14/17 5:36 PM, Matthias J. Sax wrote:
> Mathieu,
> 
> I personally agree with your observation, and we have plans to submit a
> KIP like this. If you want to drive this discussion feel free to start
> the KIP by yourself!
> 
> Having said that, for this KIP we might want to focus the discussion the
> the actual feature that gets added: allowing to specify different
> TS-Extractor for different inputs.
> 
> 
> 
> -Matthias
> 
> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
>> Hi Jeyhun,
>>
>> This KIP might not be the appropriate time, but my first thought reading it
>> is that it might make sense to introduce a builder-style API rather than
>> adding a mix of new method overloads with independent optional parameters.
>> :-)
>>
>> eg. stream(), table(), globalTable(), addSource(), could all accept a
>> "TopicReference" parameter that can be built like:
>> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
>>
>> Mathieu
>>
>>
>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov 
>> wrote:
>>
>>> Dear community,
>>>
>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2]. You
>>> can check the PR in [3].
>>>
>>> I would like to get your comments.
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>>> [3] https://github.com/apache/kafka/pull/2466
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>> --
>>> -Cheers
>>>
>>> Jeyhun
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Vahid S Hashemian

+1 (non-binding)

Built from the source and ran the quickstart successfully on Ubuntu, Mac, 
Windows (64 bit).

Thank you Ewen for running the release.

--Vahid



From:   Ewen Cheslack-Postava 
To: dev@kafka.apache.org, "us...@kafka.apache.org" 
, "kafka-clie...@googlegroups.com" 

Date:   02/14/2017 10:40 AM
Subject:[VOTE] 0.10.2.0 RC2



Hello Kafka users, developers and client-developers,

This is the third candidate for release of Apache Kafka 0.10.2.0.

This is a minor version release of Apache Kafka. It includes 19 new KIPs.
See the release notes and release plan (https://cwiki.apache.org/conf
luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few 
feature
highlights: SASL-SCRAM support, improved client compatibility to allow use
of clients newer than the broker, session windows and global tables in the
Kafka Streams API, single message transforms in the Kafka Connect 
framework.

Important note: in addition to the artifacts generated using JDK7 for 
Scala
2.10 and 2.11, this release also includes experimental artifacts built
using JDK8 for Scala 2.12.

Important code changes since RC1 (non-docs, non system tests):

KAFKA-4756; The auto-generated broker id should be passed to
MetricReporter.configure
KAFKA-4761; Fix producer regression handling small or zero batch size

Release notes for the 0.10.2.0 release:
http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html

*** Please download, test and vote by February 17th 5pm ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/

* Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=5712b489038b71ed8d5a679856d1dfaa925eadc1



* Documentation:
http://kafka.apache.org/0102/documentation.html

* Protocol:
http://kafka.apache.org/0102/protocol.html

* Successful Jenkins builds for the 0.10.2 branch:
Unit/integration tests: 
https://builds.apache.org/job/kafka-0.10.2-jdk7/77/
System tests: 
https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29/

/**

Thanks,
Ewen

[jira] [Commented] (KAFKA-4686) Null Message payload is shutting down broker

2017-02-16 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870458#comment-15870458
 ] 

Jason Gustafson commented on KAFKA-4686:


[~Ormod] Thanks. Do you recall if the topic experiencing this problem was using 
compaction or not? How were you able to resolve the issue?

> Null Message payload is shutting down broker
> 
>
> Key: KAFKA-4686
> URL: https://issues.apache.org/jira/browse/KAFKA-4686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Amazon Linux AMI release 2016.03 kernel 
> 4.4.19-29.55.amzn1.x86_64
>Reporter: Rodrigo Queiroz Saramago
> Fix For: 0.10.3.0
>
> Attachments: KAFKA-4686-NullMessagePayloadError.tar.xz, 
> kafkaServer.out
>
>
> Hello, I have a test environment with 3 brokers and 1 zookeeper nodes, in 
> which clients connect using two-way ssl authentication. I use kafka version 
> 0.10.1.1, the system works as expected for a while, but if the broker goes 
> down and then is restarted, something got corrupted and is not possible start 
> broker again, it always fails with the same error. What this error mean? What 
> can I do in this case? Is this the expected behavior?
> [2017-01-23 07:03:28,927] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.KafkaException: Message payload is null: 
> Message(magic = 0, attributes = 1, crc = 4122289508, key = null, payload = 
> null) (kafka.log.LogManager)
> [2017-01-23 07:03:28,929] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-23 07:03:28,946] INFO shutting down (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,949] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> [2017-01-23 07:03:28,954] INFO EventThread shut down for session: 
> 0x159bd458ae70008 (org.apache.zookeeper.ClientCnxn)
> [2017-01-23 07:03:28,954] INFO Session: 0x159bd458ae70008 closed 
> (org.apache.zookeeper.ZooKeeper)
> [2017-01-23 07:03:28,957] INFO shut down completed (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,959] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Colin McCabe

+1 for varints here-- it would save quite a bit of space.  They are
pretty quick to implement as well.

I think it makes sense for values to be byte arrays.  Users might want
to attach arbitrary payloads; they shouldn't be forced to serialize
everything to Java strings.

best,
Colin


On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
> Hey Michael,
> 
> Hmm, I guess the point of representing it as bytes is to allow the broker
> to pass it through opaquely? Is the cost of parsing them a concern, or
> are
> we simply trying to ensure that the broker stays agnostic to the format?
> 
> On varints, I think adding support for them makes less sense for an
> isolated use case, but as part of a more holistic change (such as what we
> have proposed in KIP-98), I think they are justifiable. If we add them,
> then the need to use attributes becomes quite a bit weaker, right? The
> other thing I find slightly odd is the fact that null headers has no
> actual
> semantic meaning for the message (unlike null keys and values). It is
> just
> a space optimization. It seems a bit better to always use size 0 to
> indicate having no headers.
> 
> Overall, the main point is ensuring that the message schema remains
> consistent, either within the larger protocol, or at a minimum within the
> message itself.
> 
> -Jason
> 
> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce 
> wrote:
> 
> > Hi Jason,
> >
> > On point 1) in the message protocol the headers are simply a byte array,
> > as like the key or value, this is to clearly demarcate the header in the
> > core message. Then the header byte array in the core message is an array of
> > key, value pairs. This is what it is denoting.
> >
> > Then this would be I guess in the given notation:
> >
> > Headers => [KeyLength, Key, ValueLength, Value]
> > KeyLength => int32 <-NEW size of the byte[] of the
> > serialised key value
> > Key => bytes <-- NEW serialised string (UTF8)
> > bytes of the header key
> > ValueLength => int32 <-- NEW size of the byte[] of the
> > serialised header value
> > Value => bytes < NEW serialised form of the header
> > value
> >
> > The key length and value length is matching the way the protocol is
> > defined in the core message currently.
> >
> >
> >
> >
> > On point 2)
> > Var sized ints, this was discussed much earlier on, in fact I had
> > suggested it myself (with Hadoop references), the complexity of this
> > compared to having a simpler protocol was argued and agreed it wasn’t worth
> > the complexity as all other clients in other languages would need to ensure
> > theyre using the right var size algorithm, as there is a few.
> >
> > On point 3)
> > We did the attributes, optional approach as originally there was marked
> > concern that headers would cause a message size overhead for others, who
> > don’t want them. As such this is the clean solution to achieve that. If
> > that no longer holds, and we don’t care that we add 4bytes overhead, then
> > im happy to remove.
> >
> > I’m personally in favour of keeping the message as small as possible so
> > people don’t get shocks in perf and throughputs dues to message size,
> > unless they actively use the feature, as such I do prefer the attribute bit
> > wise feature flag approach myself.
> >
> >
> >
> >
> > On 16/02/2017, 05:40, "Jason Gustafson"  wrote:
> >
> > We have proposed a few significant changes to the message format in
> > KIP-98
> > which now seems likely to pass (perhaps with some iterations on
> > implementation details). It would be good to try and coordinate the
> > changes
> > in both of the proposals to make sure they are consistent and
> > compatible.
> >
> > I think using the attributes to indicate null headers is a reasonable
> > approach. We have proposed to do the same thing for the message key and
> > value. That said, I sympathize with Jay's argument. Having multiple
> > ways to
> > specify a null value increases the overall complexity of the protocol.
> > You
> > can see this just from the fact that you need the extra verbiage in the
> > protocol specification in this KIP and in KIP-98 to describe the
> > dependence
> > between the fields and the attributes. It seems like a slippery slope
> > if
> > you start allowing different request types to implement the protocol
> > specification differently.
> >
> > You can also argue that the messages already are and are likely to
> > remain a
> > special case. For example, there is currently no generality in how
> > compressed message sets are represented that would be applicable for
> > other
> > request types. Some might see this divergence as an unfortunate
> > protocol
> > deficiency which should be fixed; others might see it as sort of the
> > inevitability of needing to optimize where it counts most. I'm probably
> > somewhere in between, but I think we probabl

[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870121#comment-15870121
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 6:31 PM:
-

I understand that it will eventually go away. But that does not cut it for us. 
We would like the IO thread to be shutdown after {{close}} returns. And that 
does not happen if we get an interrupt during close(). Yes, eventually it will 
go away. However, various software such as Tomcat, has thread leak detectors 
and they give an alarm based on by this behavior. Once join is interrupted, we 
(client code that uses KafkaProducer) have no way of 'waiting' until the IO 
thread goes away.


was (Author: bgedik):
I understand that it will eventually go away. But that does not cut it for us. 
We would like the IO thread to be shutdown after {{close}} returns. And that 
does not happen if we get an interrupt during close(). Yes, eventually it will 
go away. However, various software such as Tomcat, has thread leak detectors 
and they are turned on by this behavior. Once join is interrupted, we (client 
code that uses KafkaProducer) have no way of 'waiting' until the IO thread goes 
away.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4686) Null Message payload is shutting down broker

2017-02-16 Thread Hannu Valtonen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870485#comment-15870485
 ] 

Hannu Valtonen commented on KAFKA-4686:
---

Log compaction was not used in our case. Our solution was just to delete the 
whole topic (which BTW unusually required a restart of one of the Kafka nodes 
to go through) since the data in it could be recreated at will.

> Null Message payload is shutting down broker
> 
>
> Key: KAFKA-4686
> URL: https://issues.apache.org/jira/browse/KAFKA-4686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Amazon Linux AMI release 2016.03 kernel 
> 4.4.19-29.55.amzn1.x86_64
>Reporter: Rodrigo Queiroz Saramago
> Fix For: 0.10.3.0
>
> Attachments: KAFKA-4686-NullMessagePayloadError.tar.xz, 
> kafkaServer.out
>
>
> Hello, I have a test environment with 3 brokers and 1 zookeeper nodes, in 
> which clients connect using two-way ssl authentication. I use kafka version 
> 0.10.1.1, the system works as expected for a while, but if the broker goes 
> down and then is restarted, something got corrupted and is not possible start 
> broker again, it always fails with the same error. What this error mean? What 
> can I do in this case? Is this the expected behavior?
> [2017-01-23 07:03:28,927] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.KafkaException: Message payload is null: 
> Message(magic = 0, attributes = 1, crc = 4122289508, key = null, payload = 
> null) (kafka.log.LogManager)
> [2017-01-23 07:03:28,929] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-23 07:03:28,946] INFO shutting down (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,949] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> [2017-01-23 07:03:28,954] INFO EventThread shut down for session: 
> 0x159bd458ae70008 (org.apache.zookeeper.ClientCnxn)
> [2017-01-23 07:03:28,954] INFO Session: 0x159bd458ae70008 closed 
> (org.apache.zookeeper.ZooKeeper)
> [2017-01-23 07:03:28,957] INFO shut down completed (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,959] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>

Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-16 Thread Jun Rao

Hi, Radai, Mayuresh,

Thanks for the explanation. Good point on a pluggable authorizer can
customize how acls are added. However, earlier, Mayuresh was saying that in
LinkedIn's customized authorizer, it's not possible to create a principal
from string. If that's the case, will adding the principal builder in
kafka-acl.sh help? If the principal can be constructed from a string,
wouldn't it be simpler to just let kafka-acl.sh do authorization based on
that string name and not be aware of the principal builder? If you still
think there is a need, perhaps you can add a more concrete use case that
can't be done otherwise?


Hi, Mani,

For SASL, if the authorizer needs the full kerberos principal name,
currently, the user can just customize "sasl.kerberos.principal.to.local.rules"
to return the full principal name as the name for authorization, right?

Thanks,

Jun

On Wed, Feb 15, 2017 at 10:25 AM, Mayuresh Gharat <
gharatmayures...@gmail.com> wrote:

> @Jun thanks for the comments.Please see the replies inline.
>
> Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> name string.
> > Yes, the kafka-acl.sh calls the addAcl() on the inbuilt
> SimpleAclAuthorizer which in turn creates an ACL in ZK with the Principal
> name string. This is because we supply the SimpleAclAuthorizer as a
> commandline argument in the Kafka-acls.sh command.
>
> The authorizer module in the broker reads the principal name
> string from the acl path in ZK and creates the expected KafkaPrincipal for
> matching. As you can see, the expected principal is created on the broker
> side, not by the kafka-acl.sh tool.
> > This is considering the fact that the user is using the
> SimpleAclAuthorizer on the broker side and not his own custom Authorizer.
> The SimpleAclAuthorizer will take the Principal it gets from the Session
> class . Currently the Principal is KafkaPrincipal. This KafkaPrincipal is
> generated from the name of the actual channel Principal, in SocketServer
> class when processing completed receives.
> With this KIP, this will no longer be the case as the Session class will
> store a java.security.Principal instead of specific KafkaPrincipal. So the
> SimpleAclAuthorizer will construct the KafkaPrincipal from the channel
> Principal it gets from the Session class.
> User might not want to use the SimpleAclAuthorizer but use his/her own
> custom Authorizer.
>
> The broker already has the ability to
> configure PrincipalBuilder. That's why I am not sure if there is a need for
> kafka-acl.sh to customize PrincipalBuilder.
> > This is exactly the reason why we want to propose a PrincipalBuilder
> in kafka-acls.sh so that the Principal generated by the PrincipalBuilder on
> broker is consistent with that generated while creating ACLs using the
> kafka-acls.sh command line tool.
>
>
> *To summarize the above discussions :*
> What if we only make the following changes: pass the java principal in
> session and in
> SimpleAuthorizer, construct KafkaPrincipal from java principal name. Will
> that work for LinkedIn?
> --> Yes, this works for Linkedin as we are not using the kafka-acls.sh
> tool to create/update/add ACLs, for now.
>
> Do you think there is a use case for a customized authorizer and kafka-acl
> at the
> same time? If not, it's better not to complicate the kafka-acl api.
> -> At Linkedin, we don't use this tool for now. So we are fine with the
> minimal change for now.
>
> Initially, our change was minimal, just getting the Kafka to preserve the
> channel principal. Since there was a discussion how kafka-acls.sh would
> work with this change, on the ticket, we designed a detailed solution to
> make this tool generally usable with all sorts of combinations of
> Authorizers and PrincipalBuilders and give more flexibility to the end
> users.
> Without the changes proposed for kafka-acls.sh in this KIP, it cannot be
> used with a custom Authorizer/PrinipalBuilder but will only work with
> SimpleAclAuthorizer.
>
> Although, I would actually like it to work for general scenario, I am fine
> with separating it under a separate KIP and limit the scope of this KIP.
> I will update the KIP accordingly and put this under rejected alternatives
> and create a new KIP for the Kafka-acls.sh changes.
>
> @Manikumar
> Since we are limiting the scope of this KIP by not making any changes to
> kafka-acls.sh, I will cover your concern in a separate KIP that I will put
> up for kafka-acls.sh. Does that work?
>
> Thanks,
>
> Mayuresh
>
>
> On Wed, Feb 15, 2017 at 9:18 AM, radai  wrote:
>
> > @jun:
> > "Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> > name string" - yes, but not directly. all it actually does it spin-up the
> > Authorizer and call Authorizer.addAcl() on it.
> > the vanilla Authorizer goes to ZK.
> > but generally speaking, users can plug in their own Authorizers (that can
> > store/load ACLs to/from wherever).
> >
> > it would be nice if users who customize Aut

[jira] [Created] (KAFKA-4773) The Kafka build should run findbugs

2017-02-16 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4773:
--

 Summary: The Kafka build should run findbugs
 Key: KAFKA-4773
 URL: https://issues.apache.org/jira/browse/KAFKA-4773
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


The Kafka build should run findbugs to find issues that can easily be caught by 
static analysis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] kafka pull request #2557: KAFKA-4773: The Kafka build should run findbugs

2017-02-16 Thread cmccabe

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2557

KAFKA-4773: The Kafka build should run findbugs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2557


commit 5238b6ffcfd4a515fc5999b579cdc64bdbb3d4bf
Author: Colin P. Mccabe 
Date:   2017-02-16T19:48:54Z

KAFKA-4773: The Kafka build should run findbugs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4773) The Kafka build should run findbugs

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870603#comment-15870603
 ] 

ASF GitHub Bot commented on KAFKA-4773:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2557

KAFKA-4773: The Kafka build should run findbugs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2557


commit 5238b6ffcfd4a515fc5999b579cdc64bdbb3d4bf
Author: Colin P. Mccabe 
Date:   2017-02-16T19:48:54Z

KAFKA-4773: The Kafka build should run findbugs




> The Kafka build should run findbugs
> ---
>
> Key: KAFKA-4773
> URL: https://issues.apache.org/jira/browse/KAFKA-4773
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> The Kafka build should run findbugs to find issues that can easily be caught 
> by static analysis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Work started] (KAFKA-4773) The Kafka build should run findbugs

2017-02-16 Thread Colin P. McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4773 started by Colin P. McCabe.
--
> The Kafka build should run findbugs
> ---
>
> Key: KAFKA-4773
> URL: https://issues.apache.org/jira/browse/KAFKA-4773
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> The Kafka build should run findbugs to find issues that can easily be caught 
> by static analysis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Apache Kafka Docker official image

2017-02-16 Thread Gwen Shapira

I'm not sure what "official image" means here...

An image that gets tested by the Apache Kafka system tests? An image
that the PMC votes on as part of the release process? Wouldn't the
Apache official image need to be hosted on the Apache repository?

I'd love to know more on how other Apache projects manage an official
docker image.

Gwen

On Thu, Feb 16, 2017 at 8:04 AM, Gianluca Privitera
 wrote:
> Hi,
> I’m currently proposing an official image for Apache Kafka in the Docker 
> library ( https://github.com/docker-library/official-images/pull/2627 
>  ).
> I wanted to know if someone from Kafka upstream is interested in taking over 
> or you are ok with me being the maintainer of the image.
>
> Let me know so I can speed up the process of the image approval.
>
> Thanks
>
> Gianluca Privitera

-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

[jira] [Created] (KAFKA-4774) Inner classes which don't need a reference to the outer class should be static

2017-02-16 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4774:
--

 Summary: Inner classes which don't need a reference to the outer 
class should be static
 Key: KAFKA-4774
 URL: https://issues.apache.org/jira/browse/KAFKA-4774
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Inner classes which don't need a reference to the outer class should be static. 
 This takes up less space in memory, generates less load on the garbage 
collector, and eliminates a findbugs warning.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] kafka pull request #2558: KAFKA-4774. Inner classes which don't need a refer...

2017-02-16 Thread cmccabe

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2558

KAFKA-4774. Inner classes which don't need a reference to the outer câ¦

â¦lass should be static

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4774

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2558


commit a44130af1cbb2ba0ac57c32bc9ab5d5e238896c3
Author: Colin P. Mccabe 
Date:   2017-02-16T21:06:53Z

KAFKA-4774. Inner classes which don't need a reference to the outer class 
should be static




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4774) Inner classes which don't need a reference to the outer class should be static

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870702#comment-15870702
 ] 

ASF GitHub Bot commented on KAFKA-4774:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2558

KAFKA-4774. Inner classes which don't need a reference to the outer c…

…lass should be static

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4774

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2558


commit a44130af1cbb2ba0ac57c32bc9ab5d5e238896c3
Author: Colin P. Mccabe 
Date:   2017-02-16T21:06:53Z

KAFKA-4774. Inner classes which don't need a reference to the outer class 
should be static




> Inner classes which don't need a reference to the outer class should be static
> --
>
> Key: KAFKA-4774
> URL: https://issues.apache.org/jira/browse/KAFKA-4774
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Inner classes which don't need a reference to the outer class should be 
> static.  This takes up less space in memory, generates less load on the 
> garbage collector, and eliminates a findbugs warning.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Build failed in Jenkins: kafka-trunk-jdk8 #1278

2017-02-16 Thread Apache Jenkins Server

See 

Changes:

[ismael] KAFKA-4765; Fixed Intentionally Broken Hosts Resolving to 127.0.53.53 
in

--
[...truncated 17523 lines...]
org.apache.kafka.streams.KafkaStreamsTest > testIllegalMetricsConfig STARTED

org.apache.kafka.streams.KafkaStreamsTest > testIllegalMetricsConfig PASSED

org.apache.kafka.streams.KafkaStreamsTest > shouldNotGetAllTasksWhenNotRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > shouldNotGetAllTasksWhenNotRunning 
PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
testInitializesAndDestroysMetricsReporters STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
testInitializesAndDestroysMetricsReporters PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndPartitionerWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndPartitionerWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testLegalMetricsConfig STARTED

org.apache.kafka.streams.KafkaStreamsTest > testLegalMetricsConfig PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndSerializerWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndSerializerWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldReturnFalseOnCloseWhenThreadsHaventTerminated STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldReturnFalseOnCloseWhenThreadsHaventTerminated PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testNumberDefaultMetrics STARTED

org.apache.kafka.streams.KafkaStreamsTest > testNumberDefaultMetrics PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin PASSED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowExceptionOverlappingPattern STARTED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowExceptionOverlappingPattern PASSED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowExceptionOverlappingTopic STARTED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowExceptionOverlappingTopic PASSED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowStreamsExceptionNoResetSpecified STARTED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowStreamsExceptionNoResetSpecified PASSED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldOnlyReadRecordsWhereEarliestSpecified STARTED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldOnlyReadRecordsWhereEarliestSpecified PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAc

[jira] [Resolved] (KAFKA-4709) Error message from Struct.validate() should include the name of the offending field.

2017-02-16 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4709.
--
   Resolution: Fixed
Fix Version/s: 0.10.3.0

Issue resolved by pull request 2521
[https://github.com/apache/kafka/pull/2521]

> Error message from Struct.validate() should include the name of the offending 
> field.
> 
>
> Key: KAFKA-4709
> URL: https://issues.apache.org/jira/browse/KAFKA-4709
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Jeremy Custenborder
>Assignee: Jeremy Custenborder
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> Take a look at this repro.
> {code}
>   @Test
>   public void structValidate() {
> Schema schema = SchemaBuilder.struct()
> .field("one", Schema.STRING_SCHEMA)
> .field("two", Schema.STRING_SCHEMA)
> .field("three", Schema.STRING_SCHEMA)
> .build();
> Struct struct = new Struct(schema);
> struct.validate();
>   }
> {code}
> Any one of the fields could be causing the issue. The following exception is 
> thrown. This makes troubleshooting missing fields in connectors much more 
> difficult.
> {code}
> org.apache.kafka.connect.errors.DataException: Invalid value: null used for 
> required field
> {code}
> The error message should include the field or fields in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KAFKA-4775) Fix findbugs warnings in kafka-tools

2017-02-16 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4775:
--

 Summary: Fix findbugs warnings in kafka-tools
 Key: KAFKA-4775
 URL: https://issues.apache.org/jira/browse/KAFKA-4775
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Fix findbugs warnings in kafka-tools



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] kafka pull request #2521: KAFKA-4709：Error message from Struct.validate() ...

2017-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2521


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #2559: KAFKA-4775. Fix findbugs warnings in kafka-tools

2017-02-16 Thread cmccabe

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2559

KAFKA-4775. Fix findbugs warnings in kafka-tools



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4775

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2559


commit b8baf58cb6032453a66dbdfbea24b1e0fbfb
Author: Colin P. Mccabe 
Date:   2017-02-16T21:44:19Z

KAFKA-4775. Fix findbugs warnings in kafka-tools




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4709) Error message from Struct.validate() should include the name of the offending field.

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870744#comment-15870744
 ] 

ASF GitHub Bot commented on KAFKA-4709:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2521


> Error message from Struct.validate() should include the name of the offending 
> field.
> 
>
> Key: KAFKA-4709
> URL: https://issues.apache.org/jira/browse/KAFKA-4709
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Jeremy Custenborder
>Assignee: Jeremy Custenborder
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> Take a look at this repro.
> {code}
>   @Test
>   public void structValidate() {
> Schema schema = SchemaBuilder.struct()
> .field("one", Schema.STRING_SCHEMA)
> .field("two", Schema.STRING_SCHEMA)
> .field("three", Schema.STRING_SCHEMA)
> .build();
> Struct struct = new Struct(schema);
> struct.validate();
>   }
> {code}
> Any one of the fields could be causing the issue. The following exception is 
> thrown. This makes troubleshooting missing fields in connectors much more 
> difficult.
> {code}
> org.apache.kafka.connect.errors.DataException: Invalid value: null used for 
> required field
> {code}
> The error message should include the field or fields in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4775) Fix findbugs warnings in kafka-tools

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870746#comment-15870746
 ] 

ASF GitHub Bot commented on KAFKA-4775:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2559

KAFKA-4775. Fix findbugs warnings in kafka-tools



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4775

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2559


commit b8baf58cb6032453a66dbdfbea24b1e0fbfb
Author: Colin P. Mccabe 
Date:   2017-02-16T21:44:19Z

KAFKA-4775. Fix findbugs warnings in kafka-tools




> Fix findbugs warnings in kafka-tools
> 
>
> Key: KAFKA-4775
> URL: https://issues.apache.org/jira/browse/KAFKA-4775
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Fix findbugs warnings in kafka-tools



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-16 Thread Dong Lin

Hey Colin,

Thanks for the update. I have two comments:

- I actually think it is simpler and good enough to have per-topic API
instead of batch-of-topic API. This is different from the argument for
batch-of-partition API because, unlike operation on topic, people usually
operate on multiple partitions (e.g. seek(), purge()) at a time. Is there
performance concern with per-topic API? I am wondering if we should do
per-topic API until there is use-case or performance benefits of
batch-of-topic API.

- Currently we have interface "Consumer" and "Producer". And we also have
implementations of these two interfaces as "KafkaConsumer" and
"KafkaProducer". If we follow the same naming pattern, should we have
interface "AdminClient" and the implementation "KafkaAdminClient", instead
of the other way around?

Dong


On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe  wrote:

> Hi all,
>
> So I think people have made some very good points so far.  There seems
> to be agreement that we need to have explicit batch APIs for the sake of
> efficiency, so I added that back.
>
> Contexts seem a little more complex than we thought, so I removed that
> from the proposal.
>
> I removed the Impl class.  Instead, we now have a KafkaAdminClient
> interface and an AdminClient implementation.  I think this matches our
> other code better, as Jay commented.
>
> Each call now has an "Options" object that is passed in.  This will
> allow us to easily add new parameters to the calls without having tons
> of function overloads.  Similarly, each call now has a Results object,
> which will let us easily extend the results we are returning if needed.
>
> Many people made the point that Java 7 Futures are not that useful, but
> Java 8 CompletableFutures are.  With CompletableFutures, you can chain
> calls, adapt them, join them-- basically all the stuff people are doing
> in Node.js and Twisted Python.  Java 7 Futures don't really let you do
> anything but poll for a value or block.  So I felt that it was better to
> just go with a CompletableFuture-based API.
>
> People also made the point that they would like an easy API for waiting
> on complete success of a batch call.  For example, an API that would
> fail if even one topic wasn't created in createTopics.  So I came up
> with Result objects that provide multiple futures that you can wait on.
> You can wait on a future that fires when everything is complete, or you
> can wait on futures for individual topics being created.
>
> I updated the wiki, so please take a look.  Note that this new API
> requires JDK8.  It seems like JDK8 is coming soon, though, and the
> disadvantages of sticking to Java 7 are pretty big here, I think.
>
> best,
> Colin
>
>
> On Mon, Feb 13, 2017, at 11:51, Colin McCabe wrote:
> > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > > Hey Colin,
> > >
> > > Thanks for the hard work on this. I know going back and forth on APIs
> is
> > > kind of frustrating but we're at the point where these things live long
> > > enough and are used by enough people that it is worth the pain. I'm
> sure
> > > it'll come down in the right place eventually. A couple things I've
> found
> > > helped in the past:
> > >
> > >1. The burden of evidence needs to fall on the complicator. i.e. if
> > >person X thinks the api should be async they need to produce a set
> of
> > >common use cases that require this. Otherwise you are perpetually
> > >having to
> > >think "we might need x". I think it is good to have a rule of
> "simple
> > >until
> > >proven insufficient".
> > >2. Make sure we frame things for the intended audience. At this
> point
> > >our apis get used by a very broad set of Java engineers. This is a
> > >very
> > >different audience from our developer mailing list. These people
> code
> > >for a
> > >living not necessarily as a passion, and may not understand details
> of
> > >the
> > >internals of our system or even basic things like multi-threaded
> > >programming. I don't think this means we want to dumb things down,
> but
> > >rather try really hard to make things truly simple when possible.
> > >
> > > Okay here were a couple of comments:
> > >
> > >1. Conceptually what is a TopicContext? I think it means something
> > >like
> > >TopicAdmin? It is not literally context about Topics right? What is
> > >the
> > >relationship of Contexts to clients? Is there a threadsafety
> > >difference?
> > >Would be nice to not have to think about this, this is what I mean
> by
> > >"conceptual weight". We introduce a new concept that is a bit
> nebulous
> > >that
> > >I have to figure out to use what could be a simple api. I'm sure
> > >you've
> > >been through this experience before where you have these various
> > >objects
> > >and you're trying to figure out what they represent (the connection
> to
> > >the
> > >server? the information to create a conne

Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-16 Thread Jun Rao

Hi, Apurva,

Thanks for the reply. A couple of comment below.

On Wed, Feb 15, 2017 at 9:45 PM, Apurva Mehta  wrote:

> Hi Jun,
>
> Answers inline:
>
> 210. Pid snapshots: Is the number of pid snapshot configurable or hardcoded
> > with 2? When do we decide to roll a new snapshot? Based on time, byte, or
> > offset? Is that configurable too?
> >
>
> These are good questions. We haven't fleshed out the policy by which the
> snapshots will be generated. I guess there will be some scheduled task that
> takes snapshots, and we will retain the two latest ones. These will be map
> to different points in time, but each will be the complete view of the
> PID->Sequence map at the point it time it was created. At start time, the
> Pid mapping will be built from the latest snapshot unless it is somehow
> corrupt, in which case the older one will be used. Otherwise, the older one
> will be ignored.
>
> I don't think there are good reason to keep more than one, to be honest. If
> the snapshot is corrupt, we can always rebuild the map from the log itself.
> I have updated the doc to state that there will be exactly one snapshot
> file.
>
> With one snapshot, we don't need additional configs. Do you agree?
>
>
>
When a replica becomes a follower, we do a bit log truncation. Having an
older snapshot allows us to recover the PID->sequence mapping much quicker
than rescanning the whole log.



> >
> > 211. I am wondering if we should store ExpirationTime in the producer
> > transactionalId mapping message as we do in the producer transaction
> status
> > message. If a producer only calls initTransactions(), but never publishes
> > any data, we still want to be able to expire and remove the producer
> > transactionalId mapping message.
> >
> >
> Actually, the document was inaccurate. The transactionalId will be expired
> only if there is no active transaction, and the age of the last transaction
> with that transactionalId is older than the transactioanlId expiration
> time. With these semantics, storing the expiration time in the
> transactionalId mapping message won't be useful, since the expiration time
> is a moving target based on transaction activity.
>
> I have updated the doc with a clarification.
>
>
>
Currently, the producer transactionalId mapping message doesn't carry
ExpirationTime, but the producer transaction status message does.  It would
be useful if they are consistent.


>
> > 212. The doc says "The LSO is always equal to one less than the minimum
> of
> > the initial offsets across all active transactions". This implies that
> LSO
> > is inclusive. However, currently, both high watermark and log end offsets
> > are exclusive. For consistency, it seems that we should make LSO
> exclusive
> > as well.
> >
>
> Sounds good. Doc updated.
>
>
> >
> > 213. The doc says "If the topic is configured for compaction and
> deletion,
> > we will use the topic’s own retention limit. Otherwise, we will use the
> > default topic retention limit. Once the last message set produced by a
> > given PID has aged beyond the retention time, the PID will be expired."
> For
> > topics configured with just compaction, it seems it's more intuitive to
> > expire PID based on transactional.id.expiration.ms?
> >
>
> This could work. The problem is that all idempotent producers get a PID,
> but only transactional producers have a transactionalId. Since they are
> separate concepts, it seems better not to conflate PID expiration with the
> settings for transactionalId expiration.
>
> Another way of putting it: it seems more natural for the retention of the
> PID to be based on the retention of the messages in a topic. The
> transactionalId expiration is based on the transaction activity of a
> producer, which is quite a different thing.
>
> But I see your point: having these separate can put us in a situation where
> the PID gets expired even if the transactionalId is not. To solve this we
> should set the default transactionalId expiration time to be less than the
> default topic retention time, so that the transactionalId can be expired
> before the PID. Does that seem reasonable?
>
>
> >
> > 214. In the Coordinator-Broker request handling section, the doc says "If
> > the broker has a corresponding PID, verify that the received epoch is
> > greater than or equal to the current epoch. " Is the epoch the
> coordinator
> > epoch or the producer epoch?
> >
>
> This is the producer epoch, as known by the coordinator. I have clarified
> this in the document.  However, there may be producers with the same PID on
> a different epoch, which will be fenced out future calls.
>
>
> > 215. The doc says "Append to the offset topic, but skip updating the
> offset
> > cache in the delayed produce callback, until a WriteTxnMarkerRequest from
> > the transaction coordinator is received including the offset topic
> > partitions." How do we do this efficiently? Do we need to cache pending
> > offsets per pid?
> >
>
> I think we would need to cache the co

[jira] [Commented] (KAFKA-4686) Null Message payload is shutting down broker

2017-02-16 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870806#comment-15870806
 ] 

Jason Gustafson commented on KAFKA-4686:


I'm at a loss how this can happen. I tried to create a compressed message set 
with a null value and send it to the broker, but it was properly rejected. The 
only other paths that messages can be written to the log is by the log cleaner 
and the group metadata manager (excluding replication which depends on a 
successful initial write to the log), but both seem to have been ruled out and 
the validation appears correct in any case. 

Interestingly, I did find that it is possible to write an empty compressed 
message set to the log. In this case, the value of the wrapper message would 
not be null (as is the case here), but it wouldn't contain any messages. I will 
create a separate JIRA for this. This could be somehow related to this problem, 
but I haven't thought of how yet.

[~Ormod] If you hit this again, can you try to the DumpLogSegments utility 
against the log segment with the invalid data? It might also help to see the 
topic and client configs as well to know the client version.

> Null Message payload is shutting down broker
> 
>
> Key: KAFKA-4686
> URL: https://issues.apache.org/jira/browse/KAFKA-4686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Amazon Linux AMI release 2016.03 kernel 
> 4.4.19-29.55.amzn1.x86_64
>Reporter: Rodrigo Queiroz Saramago
> Fix For: 0.10.3.0
>
> Attachments: KAFKA-4686-NullMessagePayloadError.tar.xz, 
> kafkaServer.out
>
>
> Hello, I have a test environment with 3 brokers and 1 zookeeper nodes, in 
> which clients connect using two-way ssl authentication. I use kafka version 
> 0.10.1.1, the system works as expected for a while, but if the broker goes 
> down and then is restarted, something got corrupted and is not possible start 
> broker again, it always fails with the same error. What this error mean? What 
> can I do in this case? Is this the expected behavior?
> [2017-01-23 07:03:28,927] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.KafkaException: Message payload is null: 
> Message(magic = 0, attributes = 1, crc = 4122289508, key = null, payload = 
> null) (kafka.log.LogManager)
> [2017-01-23 07:03:28,929] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-23 07:03:28,946] INFO shutting down (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,949] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> [2017-01-23 07:03:28,954] INFO EventThread shut down for session: 
> 0x159bd458ae70008 (org.apache.zookeeper.ClientCnxn)
> [2017-01-23 07:03:28,954] INFO Session: 0x159bd458ae70008 closed 
> (org.apache.zookeeper.ZooKeeper)
> [2017-01-23 07:03:28,957] INFO shut down completed (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,959] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-16 Thread Colin McCabe

On Thu, Feb 16, 2017, at 14:11, Dong Lin wrote:
> Hey Colin,
> 
> Thanks for the update. I have two comments:
> 
> - I actually think it is simpler and good enough to have per-topic API
> instead of batch-of-topic API. This is different from the argument for
> batch-of-partition API because, unlike operation on topic, people usually
> operate on multiple partitions (e.g. seek(), purge()) at a time. Is there
> performance concern with per-topic API? I am wondering if we should do
> per-topic API until there is use-case or performance benefits of
> batch-of-topic API.

Yes, there is a performance concern with only supporting operations on
one topic at a time.  Jay expressed this in some of his earlier emails
and some other people did as well.  We have cases in mind for management
software where many topics are created at once.

> 
> - Currently we have interface "Consumer" and "Producer". And we also have
> implementations of these two interfaces as "KafkaConsumer" and
> "KafkaProducer". If we follow the same naming pattern, should we have
> interface "AdminClient" and the implementation "KafkaAdminClient",
> instead
> of the other way around?

That's a good point.  We should do that for consistency.

best,
Colin

> 
> Dong
> 
> 
> On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe 
> wrote:
> 
> > Hi all,
> >
> > So I think people have made some very good points so far.  There seems
> > to be agreement that we need to have explicit batch APIs for the sake of
> > efficiency, so I added that back.
> >
> > Contexts seem a little more complex than we thought, so I removed that
> > from the proposal.
> >
> > I removed the Impl class.  Instead, we now have a KafkaAdminClient
> > interface and an AdminClient implementation.  I think this matches our
> > other code better, as Jay commented.
> >
> > Each call now has an "Options" object that is passed in.  This will
> > allow us to easily add new parameters to the calls without having tons
> > of function overloads.  Similarly, each call now has a Results object,
> > which will let us easily extend the results we are returning if needed.
> >
> > Many people made the point that Java 7 Futures are not that useful, but
> > Java 8 CompletableFutures are.  With CompletableFutures, you can chain
> > calls, adapt them, join them-- basically all the stuff people are doing
> > in Node.js and Twisted Python.  Java 7 Futures don't really let you do
> > anything but poll for a value or block.  So I felt that it was better to
> > just go with a CompletableFuture-based API.
> >
> > People also made the point that they would like an easy API for waiting
> > on complete success of a batch call.  For example, an API that would
> > fail if even one topic wasn't created in createTopics.  So I came up
> > with Result objects that provide multiple futures that you can wait on.
> > You can wait on a future that fires when everything is complete, or you
> > can wait on futures for individual topics being created.
> >
> > I updated the wiki, so please take a look.  Note that this new API
> > requires JDK8.  It seems like JDK8 is coming soon, though, and the
> > disadvantages of sticking to Java 7 are pretty big here, I think.
> >
> > best,
> > Colin
> >
> >
> > On Mon, Feb 13, 2017, at 11:51, Colin McCabe wrote:
> > > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > > > Hey Colin,
> > > >
> > > > Thanks for the hard work on this. I know going back and forth on APIs
> > is
> > > > kind of frustrating but we're at the point where these things live long
> > > > enough and are used by enough people that it is worth the pain. I'm
> > sure
> > > > it'll come down in the right place eventually. A couple things I've
> > found
> > > > helped in the past:
> > > >
> > > >1. The burden of evidence needs to fall on the complicator. i.e. if
> > > >person X thinks the api should be async they need to produce a set
> > of
> > > >common use cases that require this. Otherwise you are perpetually
> > > >having to
> > > >think "we might need x". I think it is good to have a rule of
> > "simple
> > > >until
> > > >proven insufficient".
> > > >2. Make sure we frame things for the intended audience. At this
> > point
> > > >our apis get used by a very broad set of Java engineers. This is a
> > > >very
> > > >different audience from our developer mailing list. These people
> > code
> > > >for a
> > > >living not necessarily as a passion, and may not understand details
> > of
> > > >the
> > > >internals of our system or even basic things like multi-threaded
> > > >programming. I don't think this means we want to dumb things down,
> > but
> > > >rather try really hard to make things truly simple when possible.
> > > >
> > > > Okay here were a couple of comments:
> > > >
> > > >1. Conceptually what is a TopicContext? I think it means something
> > > >like
> > > >TopicAdmin? It is not literally context about Topics right? What is
> > > >

[jira] [Created] (KAFKA-4776) Implement graceful handling for improperly formed compressed message sets

2017-02-16 Thread Jason Gustafson (JIRA)

Jason Gustafson created KAFKA-4776:
--

 Summary: Implement graceful handling for improperly formed 
compressed message sets
 Key: KAFKA-4776
 URL: https://issues.apache.org/jira/browse/KAFKA-4776
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.10.1.1, 0.10.1.0, 0.10.0.1, 0.10.0.0, 0.10.2.0
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Minor


This affects validation of compressed message sets. It is possible for a buggy 
client to send both a null compressed message set (i.e. a wrapper message with 
a null value), and an empty compressed message set (i.e. a wrapper message with 
valid compressed data in the value field, but no actual records). In both 
cases, this causes an unexpected exception raised from the deep iteration, 
which is returned to the client as an UNKNOWN_ERROR. It would be better to 
return a CORRUPT_MESSAGE error.

Note also that the behavior of the empty case was potentially more problematic 
in versions prior to 0.10.2.0. Although we properly handled the null case, the 
broker would accept the empty message set and write it to the log. The impact 
of this appears to be minor, but may cause unexpected behavior in cases where 
we assume compressed message sets would contain some records.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

2017-02-16 Thread Jeyhun Karimov

Hi Matthias,

Done.

On Thu, Feb 16, 2017 at 7:24 PM Matthias J. Sax 
wrote:

> Jeyhun,
>
> can you please add the KIP to this table:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
>
> and to this list:
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams
>
> Thanks!
>
>
> -Matthias
>
> On 2/14/17 5:36 PM, Matthias J. Sax wrote:
> > Mathieu,
> >
> > I personally agree with your observation, and we have plans to submit a
> > KIP like this. If you want to drive this discussion feel free to start
> > the KIP by yourself!
> >
> > Having said that, for this KIP we might want to focus the discussion the
> > the actual feature that gets added: allowing to specify different
> > TS-Extractor for different inputs.
> >
> >
> >
> > -Matthias
> >
> > On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> >> Hi Jeyhun,
> >>
> >> This KIP might not be the appropriate time, but my first thought
> reading it
> >> is that it might make sense to introduce a builder-style API rather than
> >> adding a mix of new method overloads with independent optional
> parameters.
> >> :-)
> >>
> >> eg. stream(), table(), globalTable(), addSource(), could all accept a
> >> "TopicReference" parameter that can be built like:
> >>
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> >>
> >> Mathieu
> >>
> >>
> >> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov 
> >> wrote:
> >>
> >>> Dear community,
> >>>
> >>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
> [2]. You
> >>> can check the PR in [3].
> >>>
> >>> I would like to get your comments.
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> >>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >>> [3] https://github.com/apache/kafka/pull/2466
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>> --
> >>> -Cheers
> >>>
> >>> Jeyhun
> >>>
> >>
> >
>
> --
-Cheers

Jeyhun

[GitHub] kafka pull request #2536: MINOR: Move compression stream construction into C...

2017-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2536


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #2525: KAFKA-4484: Set more conservative default values o...

2017-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2525


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (KAFKA-4484) Set more conservative default values on RocksDB for memory usage

2017-02-16 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4484:
-
   Resolution: Fixed
Fix Version/s: 0.10.3.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2525
[https://github.com/apache/kafka/pull/2525]

> Set more conservative default values on RocksDB for memory usage
> 
>
> Key: KAFKA-4484
> URL: https://issues.apache.org/jira/browse/KAFKA-4484
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: performance
> Fix For: 0.10.3.0
>
>
> Quoting from email thread:
> {code}
> The block cache size defaults to a whopping 100Mb per store, and that gets
> expensive
> fast. I reduced it to a few megabytes. My data size is so big that I doubt
> it is very effective
> anyway. Now it seems more stable.
> I'd say that a smaller default makes sense, especially because the failure
> case is
> so opaque (running all tests just fine but with a serious dataset it dies
> slowly)
> {code}
> {code}
> Before we have the a single-knob memory management feature, I'd like to 
> propose reducing the Streams' default config values for RocksDB caching and 
> memory block size. For example, I remember Henry has done some fine tuning on 
> the RocksDB config for his use case:
> https://github.com/HenryCaiHaiying/kafka/commit/b297f7c585f5a883ee068277e5f0f1224c347bd4
> https://github.com/HenryCaiHaiying/kafka/commit/eed1726d16e528d813755a6e66b49d0bf14e8803
> https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576
> We could check if some of those changes are appropriate in general and if yes 
> change the default settings accordingly.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4484) Set more conservative default values on RocksDB for memory usage

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870883#comment-15870883
 ] 

ASF GitHub Bot commented on KAFKA-4484:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2525


> Set more conservative default values on RocksDB for memory usage
> 
>
> Key: KAFKA-4484
> URL: https://issues.apache.org/jira/browse/KAFKA-4484
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: performance
> Fix For: 0.10.3.0
>
>
> Quoting from email thread:
> {code}
> The block cache size defaults to a whopping 100Mb per store, and that gets
> expensive
> fast. I reduced it to a few megabytes. My data size is so big that I doubt
> it is very effective
> anyway. Now it seems more stable.
> I'd say that a smaller default makes sense, especially because the failure
> case is
> so opaque (running all tests just fine but with a serious dataset it dies
> slowly)
> {code}
> {code}
> Before we have the a single-knob memory management feature, I'd like to 
> propose reducing the Streams' default config values for RocksDB caching and 
> memory block size. For example, I remember Henry has done some fine tuning on 
> the RocksDB config for his use case:
> https://github.com/HenryCaiHaiying/kafka/commit/b297f7c585f5a883ee068277e5f0f1224c347bd4
> https://github.com/HenryCaiHaiying/kafka/commit/eed1726d16e528d813755a6e66b49d0bf14e8803
> https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576
> We could check if some of those changes are appropriate in general and if yes 
> change the default settings accordingly.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Build failed in Jenkins: kafka-trunk-jdk8 #1279

2017-02-16 Thread Apache Jenkins Server

See 

Changes:

[wangguoz] KAFKA-4709：Error message from Struct.validate() should include the 
name

--
[...truncated 8412 lines...]

kafka.utils.timer.TimerTest > testTaskExpiration STARTED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.timer.TimerTaskListTest > testAll STARTED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg STARTED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs STARTED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr STARTED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition STARTED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.JsonTest > testJsonEncoding STARTED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.utils.ShutdownableThreadTest > testShutdownWhenCalledAfterThreadStart 
STARTED

kafka.utils.ShutdownableThreadTest > testShutdownWhenCalledAfterThreadStart 
PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart STARTED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler STARTED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask STARTED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath PASSED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath PASSED

kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing STARTED

kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing PASSED

kafka.utils.IteratorTemplateTest > testIterator STARTED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.utils.UtilsTest > testGenerateUuidAsBase64 STARTED

kafka.utils.UtilsTest > testGenerateUuidAsBase64 PASSED

kafka.utils.UtilsTest > testAbs STARTED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix STARTED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator STARTED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes STARTED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList STARTED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt STARTED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testUrlSafeBase64EncodeUUID STARTED

kafka.utils.UtilsTest > testUrlSafeBase64EncodeUUID PASSED

kafka.utils.UtilsTest > testCsvMap STARTED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock STARTED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow STARTED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic STARTED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired STARTED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents STARTED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testBatchSize STARTED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents STARTED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testProducerQueueSize STARTED

kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner STARTED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPa

[GitHub] kafka pull request #2560: KAFKA-4494: Reduce startup and rebalance time

2017-02-16 Thread dguy

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2560

KAFKA-4494: Reduce startup and rebalance time

Replace one-by-one initialization of state stores with bulk initialization.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4494

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2560


commit 47a8cecb3494e4496d5e243e633bbedf81f2a967
Author: Damian Guy 
Date:   2017-02-14T00:48:48Z

Bulk initialization of state stores




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4494) Significant startup delays in KStreams app

2017-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870914#comment-15870914
 ] 

ASF GitHub Bot commented on KAFKA-4494:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2560

KAFKA-4494: Reduce startup and rebalance time

Replace one-by-one initialization of state stores with bulk initialization.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4494

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2560


commit 47a8cecb3494e4496d5e243e633bbedf81f2a967
Author: Damian Guy 
Date:   2017-02-14T00:48:48Z

Bulk initialization of state stores




> Significant startup delays in KStreams app
> --
>
> Key: KAFKA-4494
> URL: https://issues.apache.org/jira/browse/KAFKA-4494
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: AWS Linux ami, mac os
>Reporter: j yeargers
>Assignee: Damian Guy
>  Labels: performance
>
> Often starting a KStreams based app results in significant (5-10 minutes) 
> delay before processing of stream begins. 
> Sample debug output: 
> https://gist.github.com/jyeargers/e8398fb353d67397f99148bc970479ee
> Topology in question: stream -> map -> groupbykey.aggregate -> print
> Stream is JSON.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-16 Thread Apurva Mehta

Hi Jun,

Thanks for the reply. Comments inline.

On Thu, Feb 16, 2017 at 2:29 PM, Jun Rao  wrote:

> Hi, Apurva,
>
> Thanks for the reply. A couple of comment below.
>
> On Wed, Feb 15, 2017 at 9:45 PM, Apurva Mehta  wrote:
>
> > Hi Jun,
> >
> > Answers inline:
> >
> > 210. Pid snapshots: Is the number of pid snapshot configurable or
> hardcoded
> > > with 2? When do we decide to roll a new snapshot? Based on time, byte,
> or
> > > offset? Is that configurable too?
> > >
>


> When a replica becomes a follower, we do a bit log truncation. Having an
> older snapshot allows us to recover the PID->sequence mapping much quicker
> than rescanning the whole log.


This is a good point. I have updated the doc with a more detailed proposal.
Essentially, snapshots will be created on a periodic basis. A reasonable
period would be every 30 or 60 seconds. We will keep at most 2 copies of
the snapshot file. With this setup, we would have to replay at most 60 or
120 seconds of the log in the event of log truncation during leader
failover.

If we need to make any of this configurable, we can expose a config in the
future. It would be easier to add a config we need than remove one with
marginal utility.


>
> > >
> > > 211. I am wondering if we should store ExpirationTime in the producer
> > > transactionalId mapping message as we do in the producer transaction
> > status
> > > message. If a producer only calls initTransactions(), but never
> publishes
> > > any data, we still want to be able to expire and remove the producer
> > > transactionalId mapping message.
> > >
> > >
> > Actually, the document was inaccurate. The transactionalId will be
> expired
> > only if there is no active transaction, and the age of the last
> transaction
> > with that transactionalId is older than the transactioanlId expiration
> > time. With these semantics, storing the expiration time in the
> > transactionalId mapping message won't be useful, since the expiration
> time
> > is a moving target based on transaction activity.
> >
> > I have updated the doc with a clarification.
> >
> >
> >
> Currently, the producer transactionalId mapping message doesn't carry
> ExpirationTime, but the producer transaction status message does.  It would
> be useful if they are consistent.
>
>
You are right. The document has been updated to remove the ExpirationTime
from the transaction status messages as well. Any utility for this field
can be achieved by using the timestamp of the message itself along with
another expiration time (like transactionalId expiration time, transaction
expiration time, etc.).

Thanks,
Apurva

Should InvalidTopicException be called IllegalTopicException

2017-02-16 Thread Cosmin Lehene

It's a bit confusing when trying to understand what's the difference between 
UnknownTopicOrPartitionException (retriable) and InvalidTopicException 
(non-retriable).


Looking through the code to figure out what's an invalid topic I see most tests 
that throw it describe it as "illegal". This is also the case when trying to 
write to an internal topic.


While a longer discussion around having an invalid topic _name

Re: Should InvalidTopicException be called IllegalTopicException

2017-02-16 Thread Cosmin Lehene

Wow, I managed to send two incomplete messages thanks to the new mac touchbar :)


I was suggesting either renaming to IllegalTopicException or have a one that 
clearly mentions the name InvalidTopicNameException. However I guess the first 
option would be the easiest.


Thanks,

Cosmin



From: Cosmin Lehene
Sent: Thursday, February 16, 2017 4:29:14 PM
To: dev@kafka.apache.org
Subject: Should InvalidTopicException be called IllegalTopicException


It's a bit confusing when trying to understand what's the difference between 
UnknownTopicOrPartitionException (retriable) and InvalidTopicException 
(non-retriable).


Looking through the code to figure out what's an invalid topic I see most tests 
that throw it describe it as "illegal". This is also the case when trying to 
write to an internal topic.


While a longer discussion around having an invalid topic _name

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Ignacio Solis

-VarInts

I'm one of the people (if not the most) opposed to VarInts.  VarInts
have a place, but this is not it.   (We had a large discussion about
them at the beginning of KIP-82 time)

If anybody has real life performance numbers of VarInts improving
things or significantly reducing resources I would like to know what
that case may be. Yes, you can save some bytes here and there, but
this is probably insignificant to the overall system behavior and
storage requirements.  -- I say this with respect to using VarInts in
the protocol itself, not as part of the data.

VarInts require you to parse the Int before using it and depending on
the encoding they can suffer from aliasing (multiple representations
for the same value).

Why add complexity?

Nacho


On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe  wrote:
> +1 for varints here-- it would save quite a bit of space.  They are
> pretty quick to implement as well.
>
> I think it makes sense for values to be byte arrays.  Users might want
> to attach arbitrary payloads; they shouldn't be forced to serialize
> everything to Java strings.
>
> best,
> Colin
>
>
> On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
>> Hey Michael,
>>
>> Hmm, I guess the point of representing it as bytes is to allow the broker
>> to pass it through opaquely? Is the cost of parsing them a concern, or
>> are
>> we simply trying to ensure that the broker stays agnostic to the format?
>>
>> On varints, I think adding support for them makes less sense for an
>> isolated use case, but as part of a more holistic change (such as what we
>> have proposed in KIP-98), I think they are justifiable. If we add them,
>> then the need to use attributes becomes quite a bit weaker, right? The
>> other thing I find slightly odd is the fact that null headers has no
>> actual
>> semantic meaning for the message (unlike null keys and values). It is
>> just
>> a space optimization. It seems a bit better to always use size 0 to
>> indicate having no headers.
>>
>> Overall, the main point is ensuring that the message schema remains
>> consistent, either within the larger protocol, or at a minimum within the
>> message itself.
>>
>> -Jason
>>
>> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce 
>> wrote:
>>
>> > Hi Jason,
>> >
>> > On point 1) in the message protocol the headers are simply a byte array,
>> > as like the key or value, this is to clearly demarcate the header in the
>> > core message. Then the header byte array in the core message is an array of
>> > key, value pairs. This is what it is denoting.
>> >
>> > Then this would be I guess in the given notation:
>> >
>> > Headers => [KeyLength, Key, ValueLength, Value]
>> > KeyLength => int32 <-NEW size of the byte[] of the
>> > serialised key value
>> > Key => bytes <-- NEW serialised string (UTF8)
>> > bytes of the header key
>> > ValueLength => int32 <-- NEW size of the byte[] of the
>> > serialised header value
>> > Value => bytes < NEW serialised form of the header
>> > value
>> >
>> > The key length and value length is matching the way the protocol is
>> > defined in the core message currently.
>> >
>> >
>> >
>> >
>> > On point 2)
>> > Var sized ints, this was discussed much earlier on, in fact I had
>> > suggested it myself (with Hadoop references), the complexity of this
>> > compared to having a simpler protocol was argued and agreed it wasn’t worth
>> > the complexity as all other clients in other languages would need to ensure
>> > theyre using the right var size algorithm, as there is a few.
>> >
>> > On point 3)
>> > We did the attributes, optional approach as originally there was marked
>> > concern that headers would cause a message size overhead for others, who
>> > don’t want them. As such this is the clean solution to achieve that. If
>> > that no longer holds, and we don’t care that we add 4bytes overhead, then
>> > im happy to remove.
>> >
>> > I’m personally in favour of keeping the message as small as possible so
>> > people don’t get shocks in perf and throughputs dues to message size,
>> > unless they actively use the feature, as such I do prefer the attribute bit
>> > wise feature flag approach myself.
>> >
>> >
>> >
>> >
>> > On 16/02/2017, 05:40, "Jason Gustafson"  wrote:
>> >
>> > We have proposed a few significant changes to the message format in
>> > KIP-98
>> > which now seems likely to pass (perhaps with some iterations on
>> > implementation details). It would be good to try and coordinate the
>> > changes
>> > in both of the proposals to make sure they are consistent and
>> > compatible.
>> >
>> > I think using the attributes to indicate null headers is a reasonable
>> > approach. We have proposed to do the same thing for the message key and
>> > value. That said, I sympathize with Jay's argument. Having multiple
>> > ways to
>> > specify a null value increases the overall complexity of the protocol.
>

[GitHub] kafka pull request #2561: MINOR: Fix possible NPE in NodeApiVersions.toStrin...

2017-02-16 Thread hachikuji

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2561

MINOR: Fix possible NPE in NodeApiVersions.toString



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka fix-npe-api-version-tostring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2561


commit 99b487cf16a958b07d71baa4d9d0dcd6687a2dd3
Author: Jason Gustafson 
Date:   2017-02-17T00:36:56Z

MINOR: Fix possible NPE in NodeApiVersions.toString




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Build failed in Jenkins: kafka-trunk-jdk8 #1280

2017-02-16 Thread Apache Jenkins Server

See 

Changes:

[jason] MINOR: Move compression stream construction into CompressionType

[wangguoz] KAFKA-4484: Set more conservative default values on RocksDB for 
memory

--
[...truncated 4110 lines...]

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopic

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Jason Gustafson

Hey Nacho,

I've compared performance of our KIP-98 implementation with and without
varints. For messages around 128 bytes, we see an increase in throughput of
about 30% using the default configuration settings. At 256 bytes, the
increase is around 16%. Obviously the performance converge as messages get
larger, but it seems well worth the cost. Note that we are also seeing a
substantial performance increase against trunk primarily because of the
much more efficient packing that varints provide us. Anything adding to
message overhead, such as record headers, would only increase the relative
difference. (Of course take these numbers with a grain of salt since I have
only used the default settings with both the producer and broker on my
local machine. We intend to provide more extensive performance details as
part of the work for KIP-98.)

The implementation we are using is from protobuf (
https://developers.google.com/protocol-buffers/docs/encoding), which is
also used in HBase. It is trivial to implement and as far as I know doesn't
suffer from the aliasing problem you are describing. I checked with Magnus
(the author of librdkafka) and he agreed that the savings seemed worth the
cost of implementation.

-Jason

On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis  wrote:

> -VarInts
>
> I'm one of the people (if not the most) opposed to VarInts.  VarInts
> have a place, but this is not it.   (We had a large discussion about
> them at the beginning of KIP-82 time)
>
> If anybody has real life performance numbers of VarInts improving
> things or significantly reducing resources I would like to know what
> that case may be. Yes, you can save some bytes here and there, but
> this is probably insignificant to the overall system behavior and
> storage requirements.  -- I say this with respect to using VarInts in
> the protocol itself, not as part of the data.
>
> VarInts require you to parse the Int before using it and depending on
> the encoding they can suffer from aliasing (multiple representations
> for the same value).
>
> Why add complexity?
>
> Nacho
>
>
> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe  wrote:
> > +1 for varints here-- it would save quite a bit of space.  They are
> > pretty quick to implement as well.
> >
> > I think it makes sense for values to be byte arrays.  Users might want
> > to attach arbitrary payloads; they shouldn't be forced to serialize
> > everything to Java strings.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
> >> Hey Michael,
> >>
> >> Hmm, I guess the point of representing it as bytes is to allow the
> broker
> >> to pass it through opaquely? Is the cost of parsing them a concern, or
> >> are
> >> we simply trying to ensure that the broker stays agnostic to the format?
> >>
> >> On varints, I think adding support for them makes less sense for an
> >> isolated use case, but as part of a more holistic change (such as what
> we
> >> have proposed in KIP-98), I think they are justifiable. If we add them,
> >> then the need to use attributes becomes quite a bit weaker, right? The
> >> other thing I find slightly odd is the fact that null headers has no
> >> actual
> >> semantic meaning for the message (unlike null keys and values). It is
> >> just
> >> a space optimization. It seems a bit better to always use size 0 to
> >> indicate having no headers.
> >>
> >> Overall, the main point is ensuring that the message schema remains
> >> consistent, either within the larger protocol, or at a minimum within
> the
> >> message itself.
> >>
> >> -Jason
> >>
> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce 
> >> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > On point 1) in the message protocol the headers are simply a byte
> array,
> >> > as like the key or value, this is to clearly demarcate the header in
> the
> >> > core message. Then the header byte array in the core message is an
> array of
> >> > key, value pairs. This is what it is denoting.
> >> >
> >> > Then this would be I guess in the given notation:
> >> >
> >> > Headers => [KeyLength, Key, ValueLength, Value]
> >> > KeyLength => int32 <-NEW size of the byte[] of the
> >> > serialised key value
> >> > Key => bytes <-- NEW serialised string (UTF8)
> >> > bytes of the header key
> >> > ValueLength => int32 <-- NEW size of the byte[] of the
> >> > serialised header value
> >> > Value => bytes < NEW serialised form of the
> header
> >> > value
> >> >
> >> > The key length and value length is matching the way the protocol is
> >> > defined in the core message currently.
> >> >
> >> >
> >> >
> >> >
> >> > On point 2)
> >> > Var sized ints, this was discussed much earlier on, in fact I had
> >> > suggested it myself (with Hadoop references), the complexity of this
> >> > compared to having a simpler protocol was argued and agreed it wasn’t
> worth
> >> > the complexity as all other clients in other language

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-16 Thread Jason Gustafson

Sorry, should have noted that the performance testing was done using the
producer performance tool shipped with Kafka.

-Jason

On Thu, Feb 16, 2017 at 6:44 PM, Jason Gustafson  wrote:

> Hey Nacho,
>
> I've compared performance of our KIP-98 implementation with and without
> varints. For messages around 128 bytes, we see an increase in throughput of
> about 30% using the default configuration settings. At 256 bytes, the
> increase is around 16%. Obviously the performance converge as messages get
> larger, but it seems well worth the cost. Note that we are also seeing a
> substantial performance increase against trunk primarily because of the
> much more efficient packing that varints provide us. Anything adding to
> message overhead, such as record headers, would only increase the relative
> difference. (Of course take these numbers with a grain of salt since I have
> only used the default settings with both the producer and broker on my
> local machine. We intend to provide more extensive performance details as
> part of the work for KIP-98.)
>
> The implementation we are using is from protobuf (
> https://developers.google.com/protocol-buffers/docs/encoding), which is
> also used in HBase. It is trivial to implement and as far as I know doesn't
> suffer from the aliasing problem you are describing. I checked with Magnus
> (the author of librdkafka) and he agreed that the savings seemed worth the
> cost of implementation.
>
> -Jason
>
> On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis  wrote:
>
>> -VarInts
>>
>> I'm one of the people (if not the most) opposed to VarInts.  VarInts
>> have a place, but this is not it.   (We had a large discussion about
>> them at the beginning of KIP-82 time)
>>
>> If anybody has real life performance numbers of VarInts improving
>> things or significantly reducing resources I would like to know what
>> that case may be. Yes, you can save some bytes here and there, but
>> this is probably insignificant to the overall system behavior and
>> storage requirements.  -- I say this with respect to using VarInts in
>> the protocol itself, not as part of the data.
>>
>> VarInts require you to parse the Int before using it and depending on
>> the encoding they can suffer from aliasing (multiple representations
>> for the same value).
>>
>> Why add complexity?
>>
>> Nacho
>>
>>
>> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe 
>> wrote:
>> > +1 for varints here-- it would save quite a bit of space.  They are
>> > pretty quick to implement as well.
>> >
>> > I think it makes sense for values to be byte arrays.  Users might want
>> > to attach arbitrary payloads; they shouldn't be forced to serialize
>> > everything to Java strings.
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
>> >> Hey Michael,
>> >>
>> >> Hmm, I guess the point of representing it as bytes is to allow the
>> broker
>> >> to pass it through opaquely? Is the cost of parsing them a concern, or
>> >> are
>> >> we simply trying to ensure that the broker stays agnostic to the
>> format?
>> >>
>> >> On varints, I think adding support for them makes less sense for an
>> >> isolated use case, but as part of a more holistic change (such as what
>> we
>> >> have proposed in KIP-98), I think they are justifiable. If we add them,
>> >> then the need to use attributes becomes quite a bit weaker, right? The
>> >> other thing I find slightly odd is the fact that null headers has no
>> >> actual
>> >> semantic meaning for the message (unlike null keys and values). It is
>> >> just
>> >> a space optimization. It seems a bit better to always use size 0 to
>> >> indicate having no headers.
>> >>
>> >> Overall, the main point is ensuring that the message schema remains
>> >> consistent, either within the larger protocol, or at a minimum within
>> the
>> >> message itself.
>> >>
>> >> -Jason
>> >>
>> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce > >
>> >> wrote:
>> >>
>> >> > Hi Jason,
>> >> >
>> >> > On point 1) in the message protocol the headers are simply a byte
>> array,
>> >> > as like the key or value, this is to clearly demarcate the header in
>> the
>> >> > core message. Then the header byte array in the core message is an
>> array of
>> >> > key, value pairs. This is what it is denoting.
>> >> >
>> >> > Then this would be I guess in the given notation:
>> >> >
>> >> > Headers => [KeyLength, Key, ValueLength, Value]
>> >> > KeyLength => int32 <-NEW size of the byte[] of
>> the
>> >> > serialised key value
>> >> > Key => bytes <-- NEW serialised string (UTF8)
>> >> > bytes of the header key
>> >> > ValueLength => int32 <-- NEW size of the byte[] of
>> the
>> >> > serialised header value
>> >> > Value => bytes < NEW serialised form of the
>> header
>> >> > value
>> >> >
>> >> > The key length and value length is matching the way the protocol is
>> >> > defined in the core message currently.
>>

[GitHub] kafka pull request #2562: MINOR: Update docstring for "offsets.retention.min...

2017-02-16 Thread omkreddy

GitHub user omkreddy opened a pull request:

https://github.com/apache/kafka/pull/2562

MINOR: Update docstring for "offsets.retention.minutes" config



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omkreddy/kafka MINOR-DOC

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2562.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2562


commit aa35747180f0c5296bbf46f0ee247acfecb06583
Author: Manikumar Reddy O 
Date:   2017-02-17T06:02:51Z

MINOR: Update doc for "offsets.retention.minutes" config




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-16 Thread Manikumar

Hi Jun,

yes, we can just customize rules to send full principal name.  I was
just thinking to
use PrinciplaBuilder interface for implementing SASL rules also. So that
the interface
will be consistent across protocols.

Thanks

On Fri, Feb 17, 2017 at 1:07 AM, Jun Rao  wrote:

> Hi, Radai, Mayuresh,
>
> Thanks for the explanation. Good point on a pluggable authorizer can
> customize how acls are added. However, earlier, Mayuresh was saying that in
> LinkedIn's customized authorizer, it's not possible to create a principal
> from string. If that's the case, will adding the principal builder in
> kafka-acl.sh help? If the principal can be constructed from a string,
> wouldn't it be simpler to just let kafka-acl.sh do authorization based on
> that string name and not be aware of the principal builder? If you still
> think there is a need, perhaps you can add a more concrete use case that
> can't be done otherwise?
>
>
> Hi, Mani,
>
> For SASL, if the authorizer needs the full kerberos principal name,
> currently, the user can just customize "sasl.kerberos.principal.to.
> local.rules"
> to return the full principal name as the name for authorization, right?
>
> Thanks,
>
> Jun
>
> On Wed, Feb 15, 2017 at 10:25 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
> > @Jun thanks for the comments.Please see the replies inline.
> >
> > Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> > name string.
> > > Yes, the kafka-acl.sh calls the addAcl() on the inbuilt
> > SimpleAclAuthorizer which in turn creates an ACL in ZK with the Principal
> > name string. This is because we supply the SimpleAclAuthorizer as a
> > commandline argument in the Kafka-acls.sh command.
> >
> > The authorizer module in the broker reads the principal name
> > string from the acl path in ZK and creates the expected KafkaPrincipal
> for
> > matching. As you can see, the expected principal is created on the broker
> > side, not by the kafka-acl.sh tool.
> > > This is considering the fact that the user is using the
> > SimpleAclAuthorizer on the broker side and not his own custom Authorizer.
> > The SimpleAclAuthorizer will take the Principal it gets from the Session
> > class . Currently the Principal is KafkaPrincipal. This KafkaPrincipal is
> > generated from the name of the actual channel Principal, in SocketServer
> > class when processing completed receives.
> > With this KIP, this will no longer be the case as the Session class will
> > store a java.security.Principal instead of specific KafkaPrincipal. So
> the
> > SimpleAclAuthorizer will construct the KafkaPrincipal from the channel
> > Principal it gets from the Session class.
> > User might not want to use the SimpleAclAuthorizer but use his/her own
> > custom Authorizer.
> >
> > The broker already has the ability to
> > configure PrincipalBuilder. That's why I am not sure if there is a need
> for
> > kafka-acl.sh to customize PrincipalBuilder.
> > > This is exactly the reason why we want to propose a
> PrincipalBuilder
> > in kafka-acls.sh so that the Principal generated by the PrincipalBuilder
> on
> > broker is consistent with that generated while creating ACLs using the
> > kafka-acls.sh command line tool.
> >
> >
> > *To summarize the above discussions :*
> > What if we only make the following changes: pass the java principal in
> > session and in
> > SimpleAuthorizer, construct KafkaPrincipal from java principal name. Will
> > that work for LinkedIn?
> > --> Yes, this works for Linkedin as we are not using the
> kafka-acls.sh
> > tool to create/update/add ACLs, for now.
> >
> > Do you think there is a use case for a customized authorizer and
> kafka-acl
> > at the
> > same time? If not, it's better not to complicate the kafka-acl api.
> > -> At Linkedin, we don't use this tool for now. So we are fine with
> the
> > minimal change for now.
> >
> > Initially, our change was minimal, just getting the Kafka to preserve the
> > channel principal. Since there was a discussion how kafka-acls.sh would
> > work with this change, on the ticket, we designed a detailed solution to
> > make this tool generally usable with all sorts of combinations of
> > Authorizers and PrincipalBuilders and give more flexibility to the end
> > users.
> > Without the changes proposed for kafka-acls.sh in this KIP, it cannot be
> > used with a custom Authorizer/PrinipalBuilder but will only work with
> > SimpleAclAuthorizer.
> >
> > Although, I would actually like it to work for general scenario, I am
> fine
> > with separating it under a separate KIP and limit the scope of this KIP.
> > I will update the KIP accordingly and put this under rejected
> alternatives
> > and create a new KIP for the Kafka-acls.sh changes.
> >
> > @Manikumar
> > Since we are limiting the scope of this KIP by not making any changes to
> > kafka-acls.sh, I will cover your concern in a separate KIP that I will
> put
> > up for kafka-acls.sh. Does that work?
> >
> > Thanks,
> >
>

Request to add to contributor list

2017-02-16 Thread Shun Takebayashi

Hi dev team,

I want to be added as a contributor in Kafka JIRA.
My JIRA ID is takebayashi

Thanks,

Shun

Re: Apache Kafka Docker official image

2017-02-16 Thread Stevo Slavić

For official Apache Kafka Docker image I'd expect that it's published by
Apache Kafka project from Dockerfile and related resources all living in
Kafka's sources, and as any official Kafka source/resource those would be
subject to Apache Kafka project community processes and practices.

On Thu, Feb 16, 2017 at 9:24 PM, Gwen Shapira  wrote:

> I'm not sure what "official image" means here...
>
> An image that gets tested by the Apache Kafka system tests? An image
> that the PMC votes on as part of the release process? Wouldn't the
> Apache official image need to be hosted on the Apache repository?
>
> I'd love to know more on how other Apache projects manage an official
> docker image.
>
> Gwen
>
> On Thu, Feb 16, 2017 at 8:04 AM, Gianluca Privitera
>  wrote:
> > Hi,
> > I’m currently proposing an official image for Apache Kafka in the Docker
> library ( https://github.com/docker-library/official-images/pull/2627 <
> https://github.com/docker-library/official-images/pull/2627> ).
> > I wanted to know if someone from Kafka upstream is interested in taking
> over or you are ok with me being the maintainer of the image.
> >
> > Let me know so I can speed up the process of the image approval.
> >
> > Thanks
> >
> > Gianluca Privitera
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

78 matches

Mail list logo