Re: [DISCUSS] KIP-265: Make Windowed Serde to public APIs

2018-03-02 Thread Damian Guy
Thanks Guozhang, LGTM

On Fri, 2 Mar 2018 at 03:00 Hu Xi  wrote:

> Guozhang,
>
>
> Thanks for this KIP. Please help confirm questions below:
>
>   1.   Do we also need to retrofit `SimpleConsumerShell` to have it
> support these newly-added serdes?
>   2.   Does this KIP cover the changes for ConsoleConsumer as well?
>
>
> 
> 发件人: Guozhang Wang 
> 发送时间: 2018年3月2日 8:20
> 收件人: dev@kafka.apache.org
> 主题: [DISCUSS] KIP-265: Make Windowed Serde to public APIs
>
> Hello all,
>
> I'd like to have a discussion on making windowed serde to public APIs of
> Kafka Streams. It involves a couple of new configs, plus a few new public
> classes for windowed serializer and deserializer, and also adding the
> corresponding console consumer options in order to fetch from a topic
> written by a windowed store.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-265%3A+Make+Windowed+Serde+to+public+APIs
>
> I'd love to hear from your opinions on the proposed APIs, and if you have
> already encountered this and have to implement your own serdes, does the
> current public API fit your needs.
>
>
> Thanks,
>
> -- Guozhang
>


Re: [kafka-clients] Re: [VOTE] 1.1.0 RC0

2018-03-02 Thread Damian Guy
Thanks Jun

On Fri, 2 Mar 2018 at 02:25 Jun Rao  wrote:

> KAFKA-6111 is now merged to 1.1 branch.
>
> Thanks,
>
> Jun
>
> On Thu, Mar 1, 2018 at 2:50 PM, Jun Rao  wrote:
>
>> Hi, Damian,
>>
>> It would also be useful to include KAFKA-6111, which prevents 
>> deleteLogDirEventNotifications
>> path to be deleted correctly from Zookeeper. The patch should be committed
>> later today.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Mar 1, 2018 at 1:47 PM, Damian Guy  wrote:
>>
>>> Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Thu, 1 Mar 2018 at 19:10 Jason Gustafson  wrote:
>>>
 The fix has been merged to 1.1.

 Thanks,
 Jason

 On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy 
 wrote:

 > Hi Jason,
 >
 > Ok - thanks. Let me know how you get on.
 >
 > Cheers,
 > Damian
 >
 > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson 
 wrote:
 >
 > > Hey Damian,
 > >
 > > I think we should consider
 > > https://issues.apache.org/jira/browse/KAFKA-6593
 > > for the release. I have a patch available, but still working on
 > validating
 > > both the bug and the fix.
 > >
 > > -Jason
 > >
 > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
 matth...@confluent.io>
 > > wrote:
 > >
 > > > No. Both will be released.
 > > >
 > > > -Matthias
 > > >
 > > > On 2/28/18 6:32 AM, Marina Popova wrote:
 > > > > Sorry, maybe a stupid question, but:
 > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now
 1.1.0 RC0
 > is
 > > > coming up...
 > > > > Does it mean 1.0.1 will be abandoned and we should be looking
 forward
 > > to
 > > > 1.1.0 instead?
 > > > >
 > > > > thanks!
 > > > >
 > > > > ​Sent with ProtonMail Secure Email.​
 > > > >
 > > > > ‐‐‐ Original Message ‐‐‐
 > > > >
 > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
 > > > vahidhashem...@us.ibm.com> wrote:
 > > > >
 > > > >> +1 (non-binding)
 > > > >>
 > > > >> Built the source and ran quickstart (including streams)
 successfully
 > > on
 > > > >>
 > > > >> Ubuntu (with both Java 8 and Java 9).
 > > > >>
 > > > >> I understand the Windows platform is not officially supported,
 but I
 > > ran
 > > > >>
 > > > >> the same on Windows 10, and except for Step 7 (Connect)
 everything
 > > else
 > > > >>
 > > > >> worked fine.
 > > > >>
 > > > >> There are a number of warning and errors (including
 > > > >>
 > > > >> java.lang.ClassNotFoundException). Here's the final error
 message:
 > > > >>
 > > > >>> bin\\windows\\connect-standalone.bat
 config\\connect-standalone.
 > > > properties
 > > > >>
 > > > >> config\\connect-file-source.properties
 config\\connect-file-sink.
 > > > properties
 > > > >>
 > > > >> ...
 > > > >>
 > > > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector
 error
 > > > >>
 > > > >> (org.apache.kafka.connect.cli.ConnectStandalone)
 > > > >>
 > > > >> java.lang.NoClassDefFoundError:
 > > > >>
 > > > >> org/apache/kafka/connect/transforms/util/RegexValidator
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.SinkConnectorConfig.<
 > > > clinit>(SinkConnectorConfig.java:46)
 > > > >>
 > > > >> at
 > > > >>
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.AbstractHerder.
 > > > validateConnectorConfig(AbstractHerder.java:263)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.
 > > > putConnectorConfig(StandaloneHerder.java:164)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> org.apache.kafka.connect.cli.ConnectStandalone.main(
 > > > ConnectStandalone.java:107)
 > > > >>
 > > > >> Caused by: java.lang.ClassNotFoundException:
 > > > >>
 > > > >> org.apache.kafka.connect.transforms.util.RegexValidator
 > > > >>
 > > > >> at
 > > > >>
 > > > >> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
 > > > BuiltinClassLoader.java:582)
 > > > >>
 > > > >> at
 > > > >>
 > > > >> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.
 > > > loadClass(ClassLoaders.java:185)
 > > > >>
 > > > >> at
 java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
 > > > >>
 > > > >> ... 4 more
 > > > >>
 > > > >> Thanks for running the release.
 > > > >>
 > > > >> --Vahid
 > > > >>
 > > > >> From: Damian Guy damian@gmail.com
 > > > >>
 > > > >> To: dev@kafka.apache.org, us...@kafka.apache.org,
 > > > >>
 > > > >> kafka-clie...@googlegroups.com
 > > > >>
 > > > >> Date: 02/24/2018 08:16 AM
 > > > >>
 

Re: [DISCUSS] KIP-265: Make Windowed Serde to public APIs

2018-03-02 Thread Guozhang Wang
@Xi:

1. SimpleConsumerShell is marked as deprecated in 0.11.0 and will be
removed soon in the future releases, so I did not consider it in this KIP.
2. Yes, it is contained in the Public Interfaces section of the wiki,
please read the section and let me know (it is slightly different with your
proposed PR).


Guozhang



On Fri, Mar 2, 2018 at 3:43 AM, Damian Guy  wrote:

> Thanks Guozhang, LGTM
>
> On Fri, 2 Mar 2018 at 03:00 Hu Xi  wrote:
>
> > Guozhang,
> >
> >
> > Thanks for this KIP. Please help confirm questions below:
> >
> >   1.   Do we also need to retrofit `SimpleConsumerShell` to have it
> > support these newly-added serdes?
> >   2.   Does this KIP cover the changes for ConsoleConsumer as well?
> >
> >
> > 
> > 发件人: Guozhang Wang 
> > 发送时间: 2018年3月2日 8:20
> > 收件人: dev@kafka.apache.org
> > 主题: [DISCUSS] KIP-265: Make Windowed Serde to public APIs
> >
> > Hello all,
> >
> > I'd like to have a discussion on making windowed serde to public APIs of
> > Kafka Streams. It involves a couple of new configs, plus a few new public
> > classes for windowed serializer and deserializer, and also adding the
> > corresponding console consumer options in order to fetch from a topic
> > written by a windowed store.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 265%3A+Make+Windowed+Serde+to+public+APIs
> >
> > I'd love to hear from your opinions on the proposed APIs, and if you have
> > already encountered this and have to implement your own serdes, does the
> > current public API fit your needs.
> >
> >
> > Thanks,
> >
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: [VOTE] 1.0.1 RC2

2018-03-02 Thread Ewen Cheslack-Postava
Thanks everyone for voting. This passes with 3 binding +1, 5 non-binding
+1, and no dissenting votes.

I'll work on getting the release finalized and send out an announcement
when it is ready.

-Ewen

On Tue, Feb 27, 2018 at 11:18 PM, Jason Gustafson 
wrote:

> +1. Verified artifacts and ran the basic quickstart.
>
> -Jason
>
> On Mon, Feb 26, 2018 at 1:08 AM, Manikumar 
> wrote:
>
> > +1 (non-binding)
> > Built src and ran tests
> > Ran core quick start
> >
> > On Sat, Feb 24, 2018 at 8:44 PM, Jakub Scholz  wrote:
> >
> > > +1 (non-binding) ... I used the Scala 2.12 binaries and run my tests
> with
> > > producers / consumers.
> > >
> > > On Thu, Feb 22, 2018 at 1:06 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the third candidate for release of Apache Kafka 1.0.1.
> > > >
> > > > This is a bugfix release for the 1.0 branch that was first released
> > with
> > > > 1.0.0 about 3 months ago. We've fixed 49 issues since that release.
> > Most
> > > of
> > > > these are non-critical, but in aggregate these fixes will have
> > > significant
> > > > impact. A few of the more significant fixes include:
> > > >
> > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
> > > > plugins
> > > > * KAFKA-6185: Selector memory leak with high likelihood of OOM in
> case
> > of
> > > > down conversion
> > > > * KAFKA-6269: KTable state restore fails after rebalance
> > > > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> > > > transactional messages
> > > > * KAFKA-6529: Stop file descriptor leak when client disconnects with
> > > staged
> > > > receives
> > > > * KAFKA-6238: Issues with protocol version when applying a rolling
> > > upgrade
> > > > to 1.0.0
> > > >
> > > > Release notes for the 1.0.1 release:
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Saturday Feb 24, 9pm PT ***
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > > > https://github.com/apache/kafka/tree/1.0.1-rc2
> > > >
> > > > * Documentation:
> > > > http://kafka.apache.org/10/documentation.html
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/10/protocol.html
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > > Ewen Cheslack-Postava
> > > >
> > >
> >
>


Re: [VOTE] 1.0.1 RC2

2018-03-02 Thread Ismael Juma
Thanks for running the release Ewen!

Ismael

On Fri, Mar 2, 2018 at 10:10 AM, Ewen Cheslack-Postava 
wrote:

> Thanks everyone for voting. This passes with 3 binding +1, 5 non-binding
> +1, and no dissenting votes.
>
> I'll work on getting the release finalized and send out an announcement
> when it is ready.
>
> -Ewen
>
> On Tue, Feb 27, 2018 at 11:18 PM, Jason Gustafson 
> wrote:
>
> > +1. Verified artifacts and ran the basic quickstart.
> >
> > -Jason
> >
> > On Mon, Feb 26, 2018 at 1:08 AM, Manikumar 
> > wrote:
> >
> > > +1 (non-binding)
> > > Built src and ran tests
> > > Ran core quick start
> > >
> > > On Sat, Feb 24, 2018 at 8:44 PM, Jakub Scholz  wrote:
> > >
> > > > +1 (non-binding) ... I used the Scala 2.12 binaries and run my tests
> > with
> > > > producers / consumers.
> > > >
> > > > On Thu, Feb 22, 2018 at 1:06 AM, Ewen Cheslack-Postava <
> > > e...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the third candidate for release of Apache Kafka 1.0.1.
> > > > >
> > > > > This is a bugfix release for the 1.0 branch that was first released
> > > with
> > > > > 1.0.0 about 3 months ago. We've fixed 49 issues since that release.
> > > Most
> > > > of
> > > > > these are non-critical, but in aggregate these fixes will have
> > > > significant
> > > > > impact. A few of the more significant fixes include:
> > > > >
> > > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of
> Connect
> > > > > plugins
> > > > > * KAFKA-6185: Selector memory leak with high likelihood of OOM in
> > case
> > > of
> > > > > down conversion
> > > > > * KAFKA-6269: KTable state restore fails after rebalance
> > > > > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> > > > > transactional messages
> > > > > * KAFKA-6529: Stop file descriptor leak when client disconnects
> with
> > > > staged
> > > > > receives
> > > > > * KAFKA-6238: Issues with protocol version when applying a rolling
> > > > upgrade
> > > > > to 1.0.0
> > > > >
> > > > > Release notes for the 1.0.1 release:
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote by Saturday Feb 24, 9pm PT ***
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > http://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > > https://repository.apache.org/content/groups/staging/
> > > > >
> > > > > * Javadoc:
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/javadoc/
> > > > >
> > > > > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > > > > https://github.com/apache/kafka/tree/1.0.1-rc2
> > > > >
> > > > > * Documentation:
> > > > > http://kafka.apache.org/10/documentation.html
> > > > >
> > > > > * Protocol:
> > > > > http://kafka.apache.org/10/protocol.html
> > > > >
> > > > > /**
> > > > >
> > > > > Thanks,
> > > > > Ewen Cheslack-Postava
> > > > >
> > > >
> > >
> >
>


Build failed in Jenkins: kafka-trunk-jdk8 #2448

2018-03-02 Thread Apache Jenkins Server
See 

--
[...truncated 416.36 KB...]
kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnUnsupportedIfNoEpochRecorded PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPersistEpochsBetweenInstances PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotClearAnythingIfOffsetToFirstOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotLetOffsetsGoBackwardsEvenIfEpochsProgress PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldGetFirstOffsetOfSubsequentEpochWhenOffsetRequestedForPreviousEpoch PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest2 PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearEarliestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldPreserveResetOffsetOnClearEarliestIfOneExists PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldUpdateOffsetBetweenEpochBoundariesOnClearEarliest PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldReturnInvalidOffsetIfEpochIsRequestedWhichIsNotCurrentlyTracked PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldFetchEndOffsetOfEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldRetainLatestEpochOnClearAllEarliestAndUpdateItsOffset PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearAllEntries PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > shouldClearLatestOnEmptyCache 
PASSED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed STARTED

kafka.server.epoch.LeaderEpochFileCacheTest > 
shouldNotResetEpochHistoryHeadIfUndefinedPassed PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldIncreaseLeaderEpochBetweenLeaderRestarts PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader PASSED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse STARTED

kafka.server.epoch.LeaderEpochIntegrationTest > 
shouldSendLeaderEpochRequestAndGetAResponse PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica 
STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > shouldGetEpochsFromReplica PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnUnknownTopicOrPartitionIfThrown PASSED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown STARTED

kafka.server.epoch.OffsetsForLeaderEpochTest > 
shouldReturnNoLeaderForPartitionIfThrown PASSED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest > 
shouldSurviveFastLeaderChange STARTED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest > 
shouldSurviveFastLeaderChange PASSED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest > 
offsetsShouldNotGoBackwards STARTED

kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceTest > 
offsetsShouldNotGoBackwards PASSED

kafka.server.epoch.EpochDrivenReplicationProtocolA

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-03-02 Thread Jay Kreps
Hey Dong,

Cool, obviously we'd need to have a solution here work with connect and
streams to be viable.

On the linear hashing thing, what I am talking about is something
different. I am talking about splitting existing partitions incrementally.
E.g. if you have 100 partitions and want to move to 110. Obviously a naive
approach which added partitions would require you to reshuffle all data as
the hashing of all data would change. A linear hashing-like scheme gives an
approach by which you can split individual partitions one at a time to
avoid needing to reshuffle much data. This approach has the benefit that at
any time you have a fixed number of partitions and all data is fully
partitioned with whatever the partition count you choose is but also has
the benefit that you can dynamically scale up or down the partition count.
This seems like it simplifies things like log compaction etc.

-Jay

On Sun, Feb 25, 2018 at 3:51 PM, Dong Lin  wrote:

> Hey Jay,
>
> Thanks for the comment!
>
> I have not specifically thought about how this works with Streams and
> Connect. The current KIP w.r.t. the interface that our producer and
> consumer exposes to the user. It ensures that if there are two messages
> with the same key produced by the same producer, say messageA and messageB,
> and suppose messageB is produced after messageA to a different partition
> than messageA, then we can guarantee that the following sequence can happen
> in order:
>
> - Consumer of messageA can execute callback, in which user can flush state
> related to the key of messageA.
> - messageA is delivered by its consumer to the application
> - Consumer of messageB can execute callback, in which user can load the
> state related to the key of messageB.
> - messageB is delivered by its consumer to the application.
>
> So it seems that it should support Streams and Connect properly. But I am
> not entirely sure because I have not looked into how Streams and Connect
> works. I can think about it more if you can provide an example where this
> does not work for Streams and Connect.
>
> Regarding the second question, I think linear hashing approach provides a
> way to reduce the number of partitions that can "conflict" with a give
> partition to *log_2(n)*, as compares to *n* in the current KIP, where n is
> the total number of partitions of the topic. This will be useful when
> number of partition is large and asymptotic complexity matters.
>
> I personally don't think this optimization is worth the additional
> complexity in Kafka. This is because partition expansion or deletion should
> happen infrequently and the largest number of partitions of a single topic
> today is not that large -- probably 1000 or less. And when partitions of a
> topic changes, each consumer will likely need to query and wait for
> positions of a large percentage of partitions of the topic anyway even with
> this optimization. I think this algorithm is kind of orthogonal to this
> KIP. We can extend the KIP to support this algorithm in the future as well.
>
> Thanks,
> Dong
>
> On Thu, Feb 22, 2018 at 5:19 PM, Jay Kreps  wrote:
>
> > Hey Dong,
> >
> > Two questions:
> > 1. How will this work with Streams and Connect?
> > 2. How does this compare to a solution where we physically split
> partitions
> > using a linear hashing approach (the partition number is equivalent to
> the
> > hash bucket in a hash table)? https://en.wikipedia.org/wiki/
> Linear_hashing
> >
> > -Jay
> >
> > On Sat, Feb 10, 2018 at 3:35 PM, Dong Lin  wrote:
> >
> > > Hi all,
> > >
> > > I have created KIP-253: Support in-order message delivery with
> partition
> > > expansion. See
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 253%3A+Support+in-order+message+delivery+with+partition+expansion
> > > .
> > >
> > > This KIP provides a way to allow messages of the same key from the same
> > > producer to be consumed in the same order they are produced even if we
> > > expand partition of the topic.
> > >
> > > Thanks,
> > > Dong
> > >
> >
>


[jira] [Created] (KAFKA-6606) Regression in consumer auto-commit backoff behavior

2018-03-02 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-6606:
--

 Summary: Regression in consumer auto-commit backoff behavior
 Key: KAFKA-6606
 URL: https://issues.apache.org/jira/browse/KAFKA-6606
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Jason Gustafson
 Fix For: 1.1.0


We introduced a regression in the auto-commit behavior in KAFKA-6362. After 
initiating a send, the consumer does not reset its next commit deadline, so it 
will send auto-commits as fast as the user can call poll() until the first 
offset commit returns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-0.11.0-jdk7 #361

2018-03-02 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2018-03-02 Thread Vahid S Hashemian
Hi Jason,

I'm thinking through some of the details of the KIP with respect to your 
feedback and the decision to keep the expire-timestamp for each group 
partition in the offset message.

On point #1 below: since the expiration timer starts ticking after the 
group becomes Empty the expire_timestamp of group offsets will be set when 
that transition occurs. In normal cases that expire_timestamp is 
calculated as "current timestamp" + "broker's offset retention". Then if 
an old client provides a custom retention, we probably need a way to store 
that custom retention (and use it once the group becomes Empty). One place 
to store it is in group metadata message, but the issue is we would be 
introducing a new field only for backward compatibility (new clients don't 
overwrite the broker's retention), unless we somehow want to support this 
retention on a per-group basis. What do you think?

On point #3: as you mentioned, currently there is no "notification" 
mechanism for GroupMetadataManager in place when a subscription change 
occurs. The member subscription however is available in the group metadata 
and a poll approach could be used to check group subscriptions on a 
regular basis and expire stale offsets (if there are topics the group no 
longer is subscribed to). This can be done as part of the offset cleanup 
scheduled task that by default does not run very frequently. Were you 
thinking of a different method for capturing the subscription change?

Thanks.
--Vahid




From:   Jason Gustafson 
To: dev@kafka.apache.org
Date:   02/18/2018 01:16 PM
Subject:Re: [DISCUSS] KIP-211: Revise Expiration Semantics of 
Consumer Group Offsets



Hey Vahid,

Sorry for the late response. The KIP looks good. A few comments:

1. I'm not quite sure I understand how you are handling old clients. It
sounds like you are saying that old clients need to change configuration?
I'd suggest 1) if an old client requests the default expiration, then we
use the updated behavior, and 2) if the old client requests a specific
expiration, we enforce it from the time the group becomes Empty.

2. Does this require a new version of the group metadata messsage format? 
I
think we need to add a new field to indicate the time that the group state
changed to Empty. This will allow us to resume the expiration timer
correctly after a coordinator change. Alternatively, we could reset the
expiration timeout after every coordinator move, but it would be nice to
have a definite bound on offset expiration.

3. The question about removal of offsets for partitions which are no 
longer
in use is interesting. At the moment, it's difficult for the coordinator 
to
know that a partition is no longer being fetched because it is agnostic to
subscription state (the group coordinator is used for more than just
consumer groups). Even if we allow the coordinator to read subscription
state to tell which topics are no longer being consumed, we might need 
some
additional bookkeeping to keep track of /when/ the consumer stopped
subscribing to a particular topic. Or maybe we can reset this expiration
timer after every coordinator change when the new coordinator reads the
offsets and group metadata? I am not sure how common this use case is and
whether it needs to be solved as part of this KIP.

Thanks,
Jason



On Thu, Feb 1, 2018 at 12:40 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Thanks James for sharing that scenario.
>
> I agree it makes sense to be able to remove offsets for the topics that
> are no longer "active" in the group.
> I think it becomes important to determine what constitutes that a topic 
is
> no longer active: If we use per-partition expiration we would manually
> choose a retention time that works for the particular scenario.
>
> That works, but since we are manually intervening and specify a
> per-partition retention, why not do the intervention in some other way:
>
> One alternative for this intervention, to favor the simplicity of the
> suggested protocol in the KIP, is to improve upon the just introduced
> DELETE_GROUPS API and allow for deletion of offsets of specific topics 
in
> the group. This is what the old ZooKeeper based group management 
supported
> anyway, and we would just be leveling the group deletion features of the
> Kafka-based group management with the ZooKeeper-based one.
>
> So, instead of deciding in advance when the offsets should be removed we
> would instantly remove them when we are sure that they are no longer
> needed.
>
> Let me know what you think.
>
> Thanks.
> --Vahid
>
>
>
> From:   James Cheng 
> To: dev@kafka.apache.org
> Date:   02/01/2018 12:37 AM
> Subject:Re: [DISCUSS] KIP-211: Revise Expiration Semantics of
> Consumer Group Offsets
>
>
>
> Vahid,
>
> Under rejected alternatives, we had decided that we did NOT want to do
> per-partition expiration, and instead we wait until the entire group is
> empty and then (after the right time has passed) expire the entire

[jira] [Created] (KAFKA-6607) Kafka Streams lag not zero when input topic transactional

2018-03-02 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-6607:
--

 Summary: Kafka Streams lag not zero when input topic transactional
 Key: KAFKA-6607
 URL: https://issues.apache.org/jira/browse/KAFKA-6607
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 1.0.0
Reporter: Matthias J. Sax


When an input topic for a Kafka Streams application is written using 
transaction, Kafka Streams does not commit "endOffset" but "endOffset - 1" if 
it reaches the end of topic. The reason is the commit marker that is the last 
"message" in the topic; Streams commit "offset of last processed message plus 
1" and does not take commit markers into account.

This is not a correctness issue, but when one inspect the consumer lag via 
{{bin/kafka-consumer.group.sh}} the lag is show as 1 instead of 0 – what is 
correct from consumer-group tool point of view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk9 #444

2018-03-02 Thread Apache Jenkins Server
See 

--
[...truncated 1.24 MB...]

kafka.security.auth.ResourceTypeTest > testJavaConversions PASSED

kafka.security.auth.ResourceTypeTest > testFromString STARTED

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > 
testPeriodicTokenExpiry STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > 
testPeriodicTokenExpiry PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > 
testTokenRequestsWithDelegationTokenDisabled STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > 
testTokenRequestsWithDelegationTokenDisabled PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > testDescribeToken 
STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > testDescribeToken 
PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > testCreateToken 
STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > testCreateToken 
PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > testExpireToken 
STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > testExpireToken 
PASSED

kafka.security.token.delegation.DelegationTokenManagerTest > testRenewToken 
STARTED

kafka.security.token.delegation.DelegationTokenManagerTest > testRenewToken 
PASSED

kafka.producer.SyncProducerTest > testReachableServer STARTED

kafka.producer.SyncProducerTest > testReachableServer PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge PASSED

kafka.producer.SyncProducerTest > testNotEnoughReplicas STARTED

kafka.producer.SyncProducerTest > testNotEnoughReplicas PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero PASSED

kafka.producer.SyncProducerTest > testProducerCanTimeout STARTED

kafka.producer.SyncProducerTest > testProducerCanTimeout PASSED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse STARTED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse PASSED

kafka.producer.SyncProducerTest > testEmptyProduceRequest STARTED

kafka.producer.SyncProducerTest > testEmptyProduceRequest PASSED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse STARTED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse PASSED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic STARTED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired STARTED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents STARTED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testBatchSize STARTED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents STARTED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testProducerQueueSize STARTED

kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner STARTED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker STARTED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed STARTED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest > testJavaProducer STARTED

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder STARTED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.producer.ProducerTest > testSendToNewTopic STARTED

kafka.producer.ProducerTest > testSendToNewTopic PASSED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout STARTED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout PASSED

kafka.producer.ProducerTest > testSendNullMessage STARTED

kafka.producer.ProducerTest > testSendNullMessage PASSED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo STARTED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo PASSED

kafka.producer.ProducerTest > testSendWithDeadBroker STARTED

kafka.producer.ProducerTest > testSendWithDeadBroker PASSED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDataChange 
STARTED

kafka.zookee

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-03-02 Thread Jun Rao
linear hashing (or partition splitting) vs rehashing every key: The benefit
of the former is that it reduces the # of partitions to which keys from an
existing partition are re-distributed, which potentially reduces the
overhead of rebuilding the state in a consumer. The downside is that the
load may not be distributed evenly across partitions unless # partitions is
doubled. When the number of existing partitions is already large, one may
not want to always double the partitions.

Even with linear hashing, certain consumer instances within a consumer
group still need to rebuild the state for some partitions. This can still
affect the overall latency of a streaming job if the processing depends on
data coming from all partitions. Another way to improve this is to add
partitions in two steps. In the first step, new partitions will be added
and exposed to the consumer, but not the producer. After this step, the
consumer can start preparing the state for the new partitions, but won't
need to use them since there is no data in those new partitions yet. In the
second step, the producer can start publishing to the new partitions. At
this point, the consumer needs to process the data in new partitions.
However, if the state for the new partitions are almost ready, the amount
of waiting will be minimal. We can potentially add a new config that
controls the delay between the 2 steps.

Thanks,

Jun

On Fri, Mar 2, 2018 at 1:28 PM, Jay Kreps  wrote:

> Hey Dong,
>
> Cool, obviously we'd need to have a solution here work with connect and
> streams to be viable.
>
> On the linear hashing thing, what I am talking about is something
> different. I am talking about splitting existing partitions incrementally.
> E.g. if you have 100 partitions and want to move to 110. Obviously a naive
> approach which added partitions would require you to reshuffle all data as
> the hashing of all data would change. A linear hashing-like scheme gives an
> approach by which you can split individual partitions one at a time to
> avoid needing to reshuffle much data. This approach has the benefit that at
> any time you have a fixed number of partitions and all data is fully
> partitioned with whatever the partition count you choose is but also has
> the benefit that you can dynamically scale up or down the partition count.
> This seems like it simplifies things like log compaction etc.
>
> -Jay
>
> On Sun, Feb 25, 2018 at 3:51 PM, Dong Lin  wrote:
>
> > Hey Jay,
> >
> > Thanks for the comment!
> >
> > I have not specifically thought about how this works with Streams and
> > Connect. The current KIP w.r.t. the interface that our producer and
> > consumer exposes to the user. It ensures that if there are two messages
> > with the same key produced by the same producer, say messageA and
> messageB,
> > and suppose messageB is produced after messageA to a different partition
> > than messageA, then we can guarantee that the following sequence can
> happen
> > in order:
> >
> > - Consumer of messageA can execute callback, in which user can flush
> state
> > related to the key of messageA.
> > - messageA is delivered by its consumer to the application
> > - Consumer of messageB can execute callback, in which user can load the
> > state related to the key of messageB.
> > - messageB is delivered by its consumer to the application.
> >
> > So it seems that it should support Streams and Connect properly. But I am
> > not entirely sure because I have not looked into how Streams and Connect
> > works. I can think about it more if you can provide an example where this
> > does not work for Streams and Connect.
> >
> > Regarding the second question, I think linear hashing approach provides a
> > way to reduce the number of partitions that can "conflict" with a give
> > partition to *log_2(n)*, as compares to *n* in the current KIP, where n
> is
> > the total number of partitions of the topic. This will be useful when
> > number of partition is large and asymptotic complexity matters.
> >
> > I personally don't think this optimization is worth the additional
> > complexity in Kafka. This is because partition expansion or deletion
> should
> > happen infrequently and the largest number of partitions of a single
> topic
> > today is not that large -- probably 1000 or less. And when partitions of
> a
> > topic changes, each consumer will likely need to query and wait for
> > positions of a large percentage of partitions of the topic anyway even
> with
> > this optimization. I think this algorithm is kind of orthogonal to this
> > KIP. We can extend the KIP to support this algorithm in the future as
> well.
> >
> > Thanks,
> > Dong
> >
> > On Thu, Feb 22, 2018 at 5:19 PM, Jay Kreps  wrote:
> >
> > > Hey Dong,
> > >
> > > Two questions:
> > > 1. How will this work with Streams and Connect?
> > > 2. How does this compare to a solution where we physically split
> > partitions
> > > using a linear hashing approach (the partition number is equivalent to
> > the
> >

Re: [DISCUSS] KIP-263: Allow broker to skip sanity check of inactive segments on broker startup

2018-03-02 Thread Colin McCabe
Hi Dong,

This seems like a nice improvement.  Is there any way we could avoid adding a 
new configuration value?

It's not clear to me why we would want the old behavior.

best,
Colin


On Tue, Feb 27, 2018, at 23:57, Stephane Maarek wrote:
> This is great and definitely needed. I'm not exactly sure of what goes in
> the process of checking log files at startup, but is there something like
> signature checks of files (especially closed, immutable ones) that can be
> saved on disk and checked against at startup ? Wouldn't that help speed up
> boot time, for all segments ?
> 
> On 26 Feb. 2018 5:28 pm, "Dong Lin"  wrote:
> 
> > Hi all,
> >
> > I have created KIP-263: Allow broker to skip sanity check of inactive
> > segments on broker startup. See
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 263%3A+Allow+broker+to+skip+sanity+check+of+inactive+
> > segments+on+broker+startup
> > .
> >
> > This KIP provides a way to significantly reduce time to rolling bounce a
> > Kafka cluster.
> >
> > Comments are welcome!
> >
> > Thanks,
> > Dong
> >


[jira] [Resolved] (KAFKA-4854) Producer RecordBatch executes callbacks with `null` provided for metadata if an exception is encountered

2018-03-02 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-4854.
--
Resolution: Not A Bug
  Assignee: Ewen Cheslack-Postava

This behavior is intended. The idea is to have *either* valid metadata about 
the produced messages based on the successful reply from the broker *or* an 
exception indicating why production failed. Metadata about produced messages 
doesn't make sense in the case of an exception since the exception implies the 
messages were not successfully added to the log.

> Producer RecordBatch executes callbacks with `null` provided for metadata if 
> an exception is encountered
> 
>
> Key: KAFKA-4854
> URL: https://issues.apache.org/jira/browse/KAFKA-4854
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.1.1
>Reporter: Robert Quinlivan
>Assignee: Ewen Cheslack-Postava
>Priority: Minor
>
> When using a user-provided callback with the producer, the `RecordBatch` 
> executes the callbacks with a null metadata argument if an exception was 
> encountered. For monitoring and debugging purposes, I would prefer if the 
> metadata were provided, perhaps optionally. For example, it would be useful 
> to know the size of the serialized payload and the offset so these values 
> could appear in application logs.
> To be entirely clear, the piece of code I am considering is in 
> `org.apache.kafka.clients.producer.internals.RecordBatch#done`:
> ```java
> // execute callbacks
> for (Thunk thunk : thunks) {
> try {
> if (exception == null) {
> RecordMetadata metadata = thunk.future.value();
> thunk.callback.onCompletion(metadata, null);
> } else {
> thunk.callback.onCompletion(null, exception);
> }
> } catch (Exception e) {
> log.error("Error executing user-provided callback on message 
> for topic-partition '{}'", topicPartition, e);
> }
> }
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)