[jira] [Created] (KAFKA-7129) Dynamic default value for number of thread configuration

2018-07-03 Thread Damien Gasparina (JIRA)
Damien Gasparina created KAFKA-7129:
---

 Summary: Dynamic default value for number of thread configuration
 Key: KAFKA-7129
 URL: https://issues.apache.org/jira/browse/KAFKA-7129
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Damien Gasparina


There are properties in the broker to change the number of thread of a 
component (e.g. _num.replica.fetchers_ or _num.network.threads_). After 
discussing with [~astubbs], it seems that the default values are optimized for 
an 8 CPU machine and might not be optimized for larger machine (e.g. 48 cores). 

For those larger machine, an admin need to tune them to be able to use all 
resources of the host.

Having dynamic default value (e.g. _num.replica.fetchers = ceil(number of core 
/ 8), etc...) instead of static (e.g. _num.replica.fetchers =1) could be a more 
efficient strategy to have default values optimized for different kind of 
deployment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7130) EOFException after rolling log segment

2018-07-03 Thread Karsten Schnitter (JIRA)
Karsten Schnitter created KAFKA-7130:


 Summary: EOFException after rolling log segment
 Key: KAFKA-7130
 URL: https://issues.apache.org/jira/browse/KAFKA-7130
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 1.1.0
Reporter: Karsten Schnitter


When rolling a log segment one of our Kafka cluster got an immediate read error 
on the same partition. This lead to a flood of log messages containing the 
corresponding stacktraces. Data was still appended to the partition but 
consumers were unable to read from that partition. Reason for the exception is 
unclear.

{noformat}
[2018-07-02 23:53:32,732] INFO [Log partition=ingestion-3, 
dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. 
(kafka.log.Log)
[2018-07-02 23:53:32,739] INFO [ProducerStateManager partition=ingestion-3] 
Writing producer snapshot at offset 971865991 (kafka.log.ProducerStateManager)
[2018-07-02 23:53:32,739] INFO [Log partition=ingestion-3, 
dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. 
(kafka.log.Log)
[2018-07-02 23:53:32,750] ERROR [ReplicaManager broker=1] Error processing 
fetch operation on partition ingestion-3, offset 971865977 
(kafka.server.ReplicaManager)

Caused by: java.io.EOFException: Failed to read `log header` from file channel 
`sun.nio.ch.FileChannelImpl@2e0e8810`. Expected to read 17 bytes, but reached 
end of file after reading 0 bytes. Started read from position 2147483643.
{noformat}

We mitigated the issue by stopping the affected node and deleting the 
corresponding directory. Once the partition was recreated for the replica (we 
use replication-factor 2) the other replica experienced the same problem. We 
mitigated likewise.

To us it is unclear, what caused this issue. Can you help us in finding the 
root cause of this problem?
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-03 Thread John Roesler
Hi Guozhang,

I see. It seems like if we want to decouple 1) and 2), we need to alter the
definition of the window. Do you think it would close the gap if we added a
"window close" time to the window definition?

Such as:

builder.stream("input")
.groupByKey()
.windowedBy(
  TimeWindows
.of(60_000)
.closeAfter(10 * 60)
.until(30L * 24 * 60 * 60 * 1000)
)
.count()
.suppress(Suppression.finalResultsOnly());

Possibly called "finalResultsAtWindowClose" or something?

Thanks,
-John

On Mon, Jul 2, 2018 at 6:50 PM Guozhang Wang  wrote:

> Hey John,
>
> Obviously I'm too lazy on email replying diligence compared with you :)
> Will try to reply them separately:
>
>
> -
>
> To reply your email on "Mon, Jul 2, 2018 at 8:23 AM":
>
> I'm aware of this use case, but again, the concern is that, in this setting
> in order to let the window be queryable for 30 days, we will actually
> process data as old as 30 days as well, while most of the late updates
> beyond 5 minutes would be discarded anyways. Personally I think for the
> final update scenario, the ideal situation users would want is that "do not
> process any data that is less than 5 minutes, and of course no update
> records to the downstream later than 5 minutes either; but retain the
> window to be queryable for 30 days". And by doing that the final window
> snapshot would also be aligned with the update stream as well. In other
> words, among these three periods:
>
> 1) the retention length of the window / table.
> 2) the late records acceptance for updating the window.
> 3) the late records update to be sent downstream.
>
> Final update use cases would naturally want 2) = 3), while 1) may be
> different and larger, while what we provide now is that 1) = 2), which
> could be different and in practice larger than 3), hence not the most
> intuitive for their needs.
>
>
>
> -
>
> To reply your email on "Mon, Jul 2, 2018 at 10:27 AM":
>
> I'd like option 2) over option 1) better as well from programming pov. But
> I'm wondering if option 2) would provide the above semantics or it is still
> coupling 1) with 2) as well ?
>
>
>
> Guozhang
>
>
>
>
> On Mon, Jul 2, 2018 at 1:08 PM, John Roesler  wrote:
>
> > In fact, to push the idea further (which IIRC is what Matthias originally
> > proposed), if we can accept "Suppression#finalResultsOnly" in my last
> > email, then we could also consider whether to eliminate
> > "suppressLateEvents" entirely.
> >
> > We could always add it later, but you've both expressed doubt that there
> > are practical use cases for it outside of final-results.
> >
> > -John
> >
> > On Mon, Jul 2, 2018 at 12:27 PM John Roesler  wrote:
> >
> > > Hi again, Guozhang ;) Here's the second part of my response...
> > >
> > > It seems like your main concern is: "if I'm a user who wants final
> update
> > > semantics, how complicated is it for me to get it?"
> > >
> > > I think we have to assume that people don't always have time to become
> > > deeply familiar with all the nuances of a programming environment
> before
> > > they use it. Especially if they're evaluating several frameworks for
> > their
> > > use case, it's very valuable to make it as obvious as possible how to
> > > accomplish various computations with Streams.
> > >
> > > To me the biggest question is whether with a fresh perspective, people
> > > would say "oh, I get it, I have to bound my lateness and suppress
> > > intermediate updates, and of course I'll get only the final result!",
> or
> > if
> > > it's more like "wtf? all I want is the final result, what are all these
> > > parameters?".
> > >
> > > I was talking with Matthias a while back, and he had an idea that I
> think
> > > can help, which is to essentially set up a final-result recipe in
> > addition
> > > to the raw parameters. I previously thought that it wouldn't be
> possible
> > to
> > > restrict its usage to Windowed KTables, but thinking about it again
> this
> > > weekend, I have a couple of ideas:
> > >
> > > 
> > > = 1. Static Wrapper =
> > > 
> > > We can define an extra static function that "wraps" a KTable with
> > > final-result semantics.
> > >
> > > public static  KTable finalResultsOnly(
> > >   final KTable windowedKTable,
> > >   final Duration maxAllowedLateness,
> > >   final Suppression.BufferFullStrategy bufferFullStrategy) {
> > > return windowedKTable.suppress(
> > > Suppression.suppressLateEvents(maxAllowedLateness)
> > >.suppressIntermediateEvents(
> > >  IntermediateSuppression
> > >.emitAfter(maxAllowedLateness)
> > >.bufferFullStrategy(bufferFullStrategy)
> > >)
> > > );
> > > }
> > >
> > > Because windowedKTable is a parameter, the static function can easily
> > > impose an e

Re: [DISCUSS] KIP-326: Schedulable KTable as Graph source

2018-07-03 Thread flaviostutz
Great feature you have there!

I'll try to exercise here how we would achieve the same functional objectives 
using your KIP:

EXERCISE 1:
  - The case is "total counting of events for a huge website"
  - Tasks from Application A will have something like:
 .stream(/site-events)
 .count()
 .publish(/single-partitioned-topic-with-count-partials)
  - The published messages will be, for example:
  ["counter-task1", 2345]
  ["counter-task2", 8495]
  ["counter-task3", 4839]
  - Single Task from Application B will have something like:
 .stream(/single-partitioned-topic-with-count-partials)
 .aggregate(by messages whose key starts with "counter")
 .publish(/counter-total)
  - FAIL HERE. How would I know what is the overall partitions? Maybe two 
partials for the same task will arrive before other tasks and it maybe 
aggregated twice.

I tried to think about using GlobalKTables, but I didn't get an easy way to 
aggregate the keys from that table. Do you have any clue?

Thanks.

-Flávio Stutz






/partial-counters-to-single-partitioned-topic

On 2018/07/02 20:03:57, John Roesler  wrote: 
> Hi Flávio,
> 
> Thanks for the KIP. I'll apologize that I'm arriving late to the
> discussion. I've tried to catch up, but I might have missed some nuances.
> 
> Regarding KIP-328, the idea is to add the ability to suppress intermediate
> results from all KTables, not just windowed ones. I think this could
> support your use case in combination with the strategy that Guozhang
> proposed of having one or more pre-aggregation steps that ultimately push
> into a single-partition topic for final aggregation. Suppressing
> intermediate results would solve the problem you noted that today
> pre-aggregating doesn't do much to staunch the flow up updates.
> 
> I'm not sure if this would be good enough for you overall; I just wanted to
> clarify the role of KIP-328.
> In particular, the solution you mentioned is to have the downstream KTables
> actually query the upstream ones to compute their results. I'm not sure
> whether it's more efficient to do these queries on the schedule, or to have
> the upstream tables emit their results, on the same schedule.
> 
> What do you think?
> 
> Thanks,
> -John
> 
> On Sun, Jul 1, 2018 at 10:03 PM flaviost...@gmail.com 
> wrote:
> 
> > For what I understood, that KIP is related to how KStreams will handle
> > KTable updates in Windowed scenarios to optimize resource usage.
> > I couldn't see any specific relation to this KIP. Had you?
> >
> > -Flávio Stutz
> >
> >
> > On 2018/06/29 18:14:46, "Matthias J. Sax"  wrote:
> > > Flavio,
> > >
> > > thanks for cleaning up the KIP number collision.
> > >
> > > With regard to KIP-328
> > > (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables
> > )
> > > I am wondering how both relate to each other?
> > >
> > > Any thoughts?
> > >
> > >
> > > -Matthias
> > >
> > > On 6/29/18 10:23 AM, flaviost...@gmail.com wrote:
> > > > Just copying a follow up from another thread to here (sorry about the
> > mess):
> > > >
> > > > From: Guozhang Wang 
> > > > Subject: Re: [DISCUSS] KIP-323: Schedulable KTable as Graph source
> > > > Date: 2018/06/25 22:24:17
> > > > List: dev@kafka.apache.org
> > > >
> > > > Flávio, thanks for creating this KIP.
> > > >
> > > > I think this "single-aggregation" use case is common enough that we
> > should
> > > > consider how to efficiently supports it: for example, for KSQL that's
> > built
> > > > on top of Streams, we've seen lots of query statements whose return is
> > > > expected a single row indicating the "total aggregate" etc. See
> > > > https://github.com/confluentinc/ksql/issues/430 for details.
> > > >
> > > > I've not read through https://issues.apache.org/jira/browse/KAFKA-6953,
> > but
> > > > I'm wondering if we have discussed the option of supporting it in a
> > > > "pre-aggregate" manner: that is we do partial aggregates on parallel
> > tasks,
> > > > and then sends the partial aggregated value via a single topic
> > partition
> > > > for the final aggregate, to reduce the traffic on that single
> > partition and
> > > > hence the final aggregate workload.
> > > > Of course, for non-commutative aggregates we'd probably need to provide
> > > > another API in addition to aggregate, like the `merge` function for
> > > > session-based aggregates, to let users customize the operations of
> > merging
> > > > two partial aggregates into a single partial aggregate. What's its
> > pros and
> > > > cons compared with the current proposal?
> > > >
> > > >
> > > > Guozhang
> > > > On 2018/06/26 18:22:27, Flávio Stutz  wrote:
> > > >> Hey, guys, I've just created a new KIP about creating a new DSL graph
> > > >> source for realtime partitioned consolidations.
> > > >>
> > > >> We have faced the following scenario/problem in a lot of situations
> > with
> > > >> KStreams:
> > > >>- Huge incoming data being pr

[VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Manikumar
Manikumar 
Fri, Jun 29, 7:59 PM (4 days ago)
to dev
Hi All,

I would like to start voting on KIP-322 which would return new error code
for DeleteTopics API when topic deletion disabled.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87295558

Thanks,


Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Gwen Shapira
+1

On Tue, Jul 3, 2018 at 8:24 AM, Manikumar  wrote:

> Manikumar 
> Fri, Jun 29, 7:59 PM (4 days ago)
> to dev
> Hi All,
>
> I would like to start voting on KIP-322 which would return new error code
> for DeleteTopics API when topic deletion disabled.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87295558
>
> Thanks,
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [kafka-clients] [VOTE] 1.0.2 RC1

2018-07-03 Thread Rajini Sivaram
Hi Matthias,

+1 (binding)

Thank you for running the release.

Ran quick start with binary, tests with source, checked javadocs.

Regards,

Rajini

On Mon, Jul 2, 2018 at 9:34 PM, Harsha  wrote:

> +1.
>
> 1) Ran unit tests
> 2) 3 node cluster , tested basic operations.
>
> Thanks,
> Harsha
>
> On Mon, Jul 2nd, 2018 at 11:57 AM, Jun Rao  wrote:
>
> >
> >
> >
> > Hi, Matthias,
> >
> > Thanks for the running the release. Verified quickstart on scala 2.12
> > binary. +1
> >
> > Jun
> >
> > On Fri, Jun 29, 2018 at 10:02 PM, Matthias J. Sax <
> matth...@confluent.io >
> >
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second candidate for release of Apache Kafka 1.0.2.
> > >
> > > This is a bug fix release addressing 27 tickets:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.0.2
> > >
> > > Release notes for the 1.0.2 release:
> > > http://home.apache.org/~mjsax/kafka-1.0.2-rc1/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by end of next week (7/6/18).
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~mjsax/kafka-1.0.2-rc1/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~mjsax/kafka-1.0.2-rc1/javadoc/
> > >
> > > * Tag to be voted upon (off 1.0 branch) is the 1.0.2 tag:
> > > https://github.com/apache/kafka/releases/tag/1.0.2-rc1
> > >
> > > * Documentation:
> > > http://kafka.apache.org/10/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/10/protocol.html
> > >
> > > * Successful Jenkins builds for the 1.0 branch:
> > > Unit/integration tests: https://builds.apache.org/job/
> kafka-1.0-jdk7/214/
> >
> > > System tests:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/1.0/225/
> > >
> > > /**
> > >
> > > Thanks,
> > > -Matthias
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups
> > > "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> > an
> > > email to kafka-clients+ unsubscr...@googlegroups.com.
> > > To post to this group, send email to kafka-clie...@googlegroups.com.
> > > Visit this group at https://groups.google.com/group/kafka-clients.
> > > To view this discussion on the web visit https://groups.google.com/d/
> > > msgid/kafka-clients/ca183ad4-9285-e423-3850-261f9dfec044%40
> confluent.io.
> >
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
> >
> >
> >
>


Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Vahid S Hashemian
+1 (non-binding)

--Vahid



From:   Gwen Shapira 
To: dev 
Date:   07/03/2018 08:49 AM
Subject:Re: [VOTE] KIP-322: Return new error code for DeleteTopics 
API when topic deletion disabled.



+1

On Tue, Jul 3, 2018 at 8:24 AM, Manikumar  
wrote:

> Manikumar 
> Fri, Jun 29, 7:59 PM (4 days ago)
> to dev
> Hi All,
>
> I would like to start voting on KIP-322 which would return new error 
code
> for DeleteTopics API when topic deletion disabled.
>
> 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87295558

>
> Thanks,
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter <
https://twitter.com/ConfluentInc
> | blog
<
http://www.confluent.io/blog
>






Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Mickael Maison
+1 (non binding)
Thanks for the KIP

On Tue, Jul 3, 2018 at 4:59 PM, Vahid S Hashemian
 wrote:
> +1 (non-binding)
>
> --Vahid
>
>
>
> From:   Gwen Shapira 
> To: dev 
> Date:   07/03/2018 08:49 AM
> Subject:Re: [VOTE] KIP-322: Return new error code for DeleteTopics
> API when topic deletion disabled.
>
>
>
> +1
>
> On Tue, Jul 3, 2018 at 8:24 AM, Manikumar 
> wrote:
>
>> Manikumar 
>> Fri, Jun 29, 7:59 PM (4 days ago)
>> to dev
>> Hi All,
>>
>> I would like to start voting on KIP-322 which would return new error
> code
>> for DeleteTopics API when topic deletion disabled.
>>
>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87295558
>
>>
>> Thanks,
>>
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <
> https://twitter.com/ConfluentInc
>> | blog
> <
> http://www.confluent.io/blog
>>
>
>
>
>


Re: [DISCUSS] KIP-325: Extend Consumer Group Command to Show Beginning Offsets

2018-07-03 Thread Vahid S Hashemian
Hi Jason,

Thanks for the feedback. Your suggestions make sense to me. I think I'm 
more in favor of adding this info to kafka-topic tool (through another 
KIP) since it is not consumer group specific.
I'll wait for Gwen and others to comment before making changes to the KIP.

--Vahid




From:   Jason Gustafson 
To: dev 
Date:   06/28/2018 02:39 PM
Subject:Re: [DISCUSS] KIP-325: Extend Consumer Group Command to 
Show Beginning Offsets



Hey Gwen/Vahid,

I think that use case makes sense, but isn't it a little odd to go through
the consumer group tool? I would expect to find something like that from
the kafka-topics tool instead. I don't feel too strongly about it, but I
hate to complicate the tool by adding the need to query topic configs. If
we don't have a meaningful number to report for compacted topics anyway,
then it feels like only a half solution. I'd probably suggest leaving this
out or just reporting the absolute difference even if a topic is 
compacted.

-Jason



On Thu, Jun 28, 2018 at 1:05 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi James,
>
>
>
> Thanks for the feedback. I updated the KIP and added some of the 
benefits
>
> of this improvement (including some that you mentioned).
>
>
>
> Regards.
>
> --Vahid
>
>
>
>
>
>
>
> From:   James Cheng 
>
> To: dev@kafka.apache.org
>
> Date:   06/27/2018 09:13 PM
>
> Subject:Re: [DISCUSS] KIP-325: Extend Consumer Group Command to
>
> Show Beginning Offsets
>
>
>
>
>
>
>
> The “Motivation” section of the KIP says that the starting offset will 
be
>
> useful but doesn’t say why. Can you add a use-case or two to describe 
how
>
> it will be useful?
>
>
>
> In our case (see
>
> 
https://github.com/wushujames/kafka-utilities/blob/master/Co

> nsumerGroupLag/README.md
>
> ), we found the starting offset useful so that we could calculate
>
> partition size so that we could identify empty partitions (partitions
>
> where all the data had expired). In particular, we needed that info so
>
> that we could calculate “lag”. Consider that case where we are asked to
>
> mirror an abandoned topic where startOffset==endOffset==100. We 
would
>
> have no committed offset on it, and the topic has no data in it, so we
>
> won’t soon get any committed offset, and so “lag” is kind of undefined. 
We
>
> used the additional startOffset to allow us to detect this case.
>
>
>
> -James
>
>
>
> Sent from my iPhone
>
>
>
> > On Jun 26, 2018, at 11:23 AM, Vahid S Hashemian
>
>  wrote:
>
> >
>
> > Hi everyone,
>
> >
>
> > I have created a trivial KIP to improve the offset reporting of the
>
> > consumer group command:
>
> >
>
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-325%

> 3A+Extend+Consumer+Group+Command+to+Show+Beginning+Offsets
>
>
>
> > Looking forward to your feedback!
>
> >
>
> > Thanks.
>
> > --Vahid
>
> >
>
> >
>
>
>
>
>
>
>
>
>
>






Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Ted Yu
+1

On Tue, Jul 3, 2018 at 9:05 AM, Mickael Maison 
wrote:

> +1 (non binding)
> Thanks for the KIP
>
> On Tue, Jul 3, 2018 at 4:59 PM, Vahid S Hashemian
>  wrote:
> > +1 (non-binding)
> >
> > --Vahid
> >
> >
> >
> > From:   Gwen Shapira 
> > To: dev 
> > Date:   07/03/2018 08:49 AM
> > Subject:Re: [VOTE] KIP-322: Return new error code for
> DeleteTopics
> > API when topic deletion disabled.
> >
> >
> >
> > +1
> >
> > On Tue, Jul 3, 2018 at 8:24 AM, Manikumar 
> > wrote:
> >
> >> Manikumar 
> >> Fri, Jun 29, 7:59 PM (4 days ago)
> >> to dev
> >> Hi All,
> >>
> >> I would like to start voting on KIP-322 which would return new error
> > code
> >> for DeleteTopics API when topic deletion disabled.
> >>
> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=87295558
> >
> >>
> >> Thanks,
> >>
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter <
> > https://twitter.com/ConfluentInc
> >> | blog
> > <
> > http://www.confluent.io/blog
> >>
> >
> >
> >
> >
>


Re: [DISCUSS] KIP-326: Schedulable KTable as Graph source

2018-07-03 Thread John Roesler
Hi Flávio,
Thanks! I think that we can actually do this, but the API could be better.
I've included Java code below, but I'll copy and modify your example so
we're on the same page.

EXERCISE 1:
  - The case is "total counting of events for a huge website"
  - Tasks from Application A will have something like:
 .stream(/site-events)
 .transform( re-key s.t. the new key is the partition id)
 .groupByKey() // you have to do this before count
 .count()
  // you explicitly published to a one-partition topic here, but
it's actually sufficient just
  // to re-group onto one key. You could name and pre-create the
intermediate topic here,
  // but you don't need a separate application for the final
aggregation.
 .groupBy((partitionId, partialCount) -> new KeyValue("ALL",
partialCount))
 .aggregate(sum up the partialCounts)
 .publish(/counter-total)

I've left out the suppressions, but they would go right after the count()
and the aggregate().

With this program, you don't have to worry about the double-aggregation you
mentioned in the last email. The KTable produced by the first count() will
maintain the correct count per partition. If the value changes for any
partition, it'll emit a retraction of the old value and then the new value
downstream, so that the final aggregation can update itself properly.

I think we can optimize both the execution and the programability by adding
a "global aggregation" concept. But In principle, it seems like this usage
of the current API will support your use case.

Once again, though, this is just to present an alternative. I haven't done
the math on whether your proposal would be more efficient.

Thanks,
-John

Here's the same algorithm written in Java:

final KStream siteEvents = builder.stream("/site-events");

// here we re-key the events so that the key is actually the partition id.
// we don't need the value to do a count, so I just set it to "1".
final KStream keyedByPartition = siteEvents.transform(()
-> new Transformer>() {
private ProcessorContext context;

@Override
public void init(final ProcessorContext context) {
this.context = context;
}

@Override
public KeyValue transform(final String key, final
String value) {
return new KeyValue<>(context.partition(), 1);
}
});

// Note that we can't do "count()" on a KStream, we have to group it first.
I'm grouping by the key, so it will produce the count for each key.
// Since the key is actually the partition id, it will produce the
pre-aggregated count per partition.
// Note that the result is a KTable. It'll always
contain the most recent count for each partition.
final KTable countsByPartition =
keyedByPartition.groupByKey().count();

// Now we get ready for the final roll-up. We re-group all the constituent
counts
final KGroupedTable singlePartition =
countsByPartition.groupBy((key, value) -> new KeyValue<>("ALL", value));

final KTable totalCount = singlePartition.reduce((l, r) -> l
+ r, (l, r) -> l - r);

totalCount.toStream().foreach((k, v) -> {
// k is always "ALL"
// v is always the most recent total value
System.out.println("The total event count is: " + v);
});


On Tue, Jul 3, 2018 at 9:21 AM flaviost...@gmail.com 
wrote:

> Great feature you have there!
>
> I'll try to exercise here how we would achieve the same functional
> objectives using your KIP:
>
> EXERCISE 1:
>   - The case is "total counting of events for a huge website"
>   - Tasks from Application A will have something like:
>  .stream(/site-events)
>  .count()
>  .publish(/single-partitioned-topic-with-count-partials)
>   - The published messages will be, for example:
>   ["counter-task1", 2345]
>   ["counter-task2", 8495]
>   ["counter-task3", 4839]
>   - Single Task from Application B will have something like:
>  .stream(/single-partitioned-topic-with-count-partials)
>  .aggregate(by messages whose key starts with "counter")
>  .publish(/counter-total)
>   - FAIL HERE. How would I know what is the overall partitions? Maybe two
> partials for the same task will arrive before other tasks and it maybe
> aggregated twice.
>
> I tried to think about using GlobalKTables, but I didn't get an easy way
> to aggregate the keys from that table. Do you have any clue?
>
> Thanks.
>
> -Flávio Stutz
>
>
>
>
>
>
> /partial-counters-to-single-partitioned-topic
>
> On 2018/07/02 20:03:57, John Roesler  wrote:
> > Hi Flávio,
> >
> > Thanks for the KIP. I'll apologize that I'm arriving late to the
> > discussion. I've tried to catch up, but I might have missed some nuances.
> >
> > Regarding KIP-328, the idea is to add the ability to suppress
> intermediate
> > results from all KTables, not just windowed ones. I think this could
> > support your use case in combination with the strategy that Guozhang
> > proposed of having one or more pre-aggregation steps that ul

Re: [DISCUSS] KIP-331 Add default implementation to close() and configure() for Serializer, Deserializer and Serde

2018-07-03 Thread John Roesler
Hi again, Chia-Ping,

Thanks! I took a look at it, and it seems like a good change to me.

I don't know if it will generate much discussion, so I recommend just
letting it steep for another day or two and then moving on to a vote if no
one else chimes in.

Just really scraping my mind for concerns to investigate... This change
won't break source compatibility, but will it affect binary compatibility?
For example, if I compile my application against Kafka 2.0, for example,
and then swap in the Kafka jar containing your change on my classpath at
run time, will it still work?

I think it should be binary compatible, assuming Oracle didn't do anything
crazy, since all the method references would be the same. It might we worth
an experiment, though.

Thanks,
-John

On Mon, Jul 2, 2018 at 9:29 PM Chia-Ping Tsai  wrote:

> > Can you provide a link, please?
>
> Pardon me. I put the page in the incorrect location.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-331+Add+default+implementation+to+close%28%29+and+configure%28%29+for+Serializer%2C+Deserializer+and+Serde
>
> Cheers,
> Chia-Ping
>
> On 2018/07/02 19:45:19, John Roesler  wrote:
> > Hi Chia-Ping,
> >
> > I couldn't find KIP-331 in the list of KIPs (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > ).
> >
> > Can you provide a link, please?
> >
> > Thanks,
> > -John
> >
> > On Sun, Jul 1, 2018 at 11:33 AM Chia-Ping Tsai 
> wrote:
> >
> > > hi folks,
> > >
> > > KIP-331 is waiting for any suggestions, feedback and reviews. The main
> > > purpose of the KIP-331 is to add empty implementations to the close()
> and
> > > configure() so as to user can write less code to develop custom
> Serialzier,
> > > Deserializer and Serde.
> > >
> > > Cheers,
> > > Chia-Ping
> > >
> >
>


[ANNOUNCE] Apache Kafka 0.11.0.3 Released

2018-07-03 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

The Apache Kafka community is pleased to announce the release for
Apache Kafka 0.11.0.3.


This is a bug fix release and it includes fixes and improvements from
27 JIRAs, including a few critical bugs.


All of the changes in this release can be found in the release notes:


https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/RELEASE_NOTES.
html



You can download the source release from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka-0.11.0.
3-src.tgz


and binary releases from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.11-0.
11.0.3.tgz
(Scala 2.11)

https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.12-0.
11.0.3.tgz
(Scala 2.12)


- 
- ---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records
to one or more Kafka topics.


** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.


** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming
the input streams to output streams.


** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.three key capabilities:



With these APIs, Kafka can be used for two broad classes of application:


** Building real-time streaming data pipelines that reliably get data
between systems or applications.


** Building real-time streaming applications that transform or react
to the streams of data.



Apache Kafka is in use at large and small companies worldwide,
including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
Zalando, among others.



A big thank you for the following 26 contributors to this release!


Matthias J. Sax, Ewen Cheslack-Postava, Konstantine Karantasis,
Guozhang Wang, Rajini Sivaram, Randall Hauch, tedyu, Jagadesh
Adireddi, Jarek Rudzinski, Jason Gustafson, Jeremy Custenborder, Anna
Povzner, Joel Hamill, John Roesler, Max Zheng, Mickael Maison, Robert
Yokota, Yaswanth Kumar, parafiend, Jiangjie (Becket) Qin, Arjun
Satish, Bill Bejeck, Damian Guy, Gitomain, Gunnar Morling, Ismael Juma


We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/


Thank you!


Regards,
 -Matthias
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls7wQAACgkQDccxaWtL
g1+b/g/+LjM5gh8u2wCVz7dhOstwvtaajRG7cG1QhZH3H9QquVs19aKiE9ZcvEcK
eJkX0S7rWopXs2qQxy5fVCTWGw5yO4eFNWuWxSIffuxH8/3K2sKahPi/4IDgd5Tj
ksmsxyXxWtGv/vEosJr+ZD7s1urPpkQ7DG6CT9wG9wj2ASq7sur/Eg7jfAnuIoTQ
UvQenKXU0T+D+BZKpUiZs5e6VGya6bUzboAbPGiwsMH4/xj2IlOEjVAyf3ppnuiu
/AW2LLqkFnbDB0IbveOu2+73CvVlahkaZ6nhPjkVpdpFw/SCAZHdkGdCafo8DKP8
DKcmzta/QCEJ1uQUe7Rh8ndzYLzTaU0rqilA2WZUZvTx0gkviDGvQv/S97XP8lRJ
SLn2xk166dxw0zpuIfzo0rr3S2Mz5PmAhrxiVxDG9ihaqBnABePspjp+cTXLhGhX
5zEhh1THiShjT03ZSPP8SEioQmj9LoQ9FH53/RXGmQ35O/nv4bAcvRvkqntFoF4Z
iXE0bhQ2RyffQjBc70uJfdrpRbsmPqnNKSJ+60cB9y6jN+aQBuQdjB54ypu203mp
x+yj7Fl+yf/EFbcs4aeAccAnx3J8uo6K1bKJmJtWrrBIIF28nNBrdBXGWh898rGe
+m7teNKOm6WJXnuzASja82xJjul60WWOwAFLSOL1aAqo+At5Sps=
=4xXe
-END PGP SIGNATURE-


Build failed in Jenkins: kafka-trunk-jdk8 #2789

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] MINOR: Fix standby streamTime (#5288)

--
[...truncated 871.08 KB...]
kafka.security.auth.ResourceTest > shouldParseThreePartWithEmbeddedSeparators 
STARTED

kafka.security.auth.ResourceTest > shouldParseThreePartWithEmbeddedSeparators 
PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAuthorizeWithPrefixedResource 
STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAuthorizeWithPrefixedResource 
PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDeleteAllAclOnWildcardResource STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDeleteAllAclOnWildcardResource PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAddAclsOnWildcardResource 
STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAddAclsOnWildcardResource 
PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesExtendedAclChangeEventWhenInterBrokerProtocolAtLeastKafkaV2 STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesExtendedAclChangeEventWhenInterBrokerProtocolAtLeastKafkaV2 PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesLiteralAclChangeEventWhenInterBrokerProtocolIsKafkaV2 STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesLiteralAclChangeEventWhenInterBrokerProtocolIsKafkaV2 PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDeleteAclOnPrefixedResource 
STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDeleteAclOnPrefixedResource 
PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testSuperUserWithCustomPrincipalHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testSuperUserWithCustomPrincipalHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testAllowAccessWithCustomPrincipal STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testAllowAccessWithCustomPrincipal PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDeleteAclOnWildcardResource 
STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDeleteAclOnWildcardResource 
PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testChangeListenerTiming STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testChangeListenerTiming PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesLiteralWritesLiteralAclChangeEventWhenInterBrokerProtocolLessThanKafkaV2eralAclChangesForOlderProtocolVersions
 STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testWritesLiteralWritesLiteralAclChangeEventWhenInterBrokerProtocolLessThanKafkaV2eralAclChangesForOlderProtocolVersions
 PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testThrowsOnAddPrefixedAclIfInterBrokerProtocolVersionTooLow STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testThrowsOnAddPrefixedAclIfInterBrokerProtocolVersionTooLow PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testAccessAllowedIfAllowAclExistsOnPrefixedResource STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testAccess

Re: [ANNOUNCE] Apache Kafka 0.11.0.3 Released

2018-07-03 Thread Guozhang Wang
Thanks Matthias for driving the release!

On Tue, Jul 3, 2018 at 11:31 AM, Matthias J. Sax  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> The Apache Kafka community is pleased to announce the release for
> Apache Kafka 0.11.0.3.
>
>
> This is a bug fix release and it includes fixes and improvements from
> 27 JIRAs, including a few critical bugs.
>
>
> All of the changes in this release can be found in the release notes:
>
>
> https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/RELEASE_NOTES.
> html
>
>
>
> You can download the source release from:
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka-0.11.0.
> 3-src.tgz
>
>
> and binary releases from:
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.11-0.
> 11.0.3.tgz
> (Scala 2.11)
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.12-0.
> 11.0.3.tgz
> (Scala 2.12)
>
>
> - 
> - ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream records
> to one or more Kafka topics.
>
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming
> the input streams to output streams.
>
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.three key capabilities:
>
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
>
> Apache Kafka is in use at large and small companies worldwide,
> including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
> Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
> Zalando, among others.
>
>
>
> A big thank you for the following 26 contributors to this release!
>
>
> Matthias J. Sax, Ewen Cheslack-Postava, Konstantine Karantasis,
> Guozhang Wang, Rajini Sivaram, Randall Hauch, tedyu, Jagadesh
> Adireddi, Jarek Rudzinski, Jason Gustafson, Jeremy Custenborder, Anna
> Povzner, Joel Hamill, John Roesler, Max Zheng, Mickael Maison, Robert
> Yokota, Yaswanth Kumar, parafiend, Jiangjie (Becket) Qin, Arjun
> Satish, Bill Bejeck, Damian Guy, Gitomain, Gunnar Morling, Ismael Juma
>
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
>
> Thank you!
>
>
> Regards,
>  -Matthias
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls7wQAACgkQDccxaWtL
> g1+b/g/+LjM5gh8u2wCVz7dhOstwvtaajRG7cG1QhZH3H9QquVs19aKiE9ZcvEcK
> eJkX0S7rWopXs2qQxy5fVCTWGw5yO4eFNWuWxSIffuxH8/3K2sKahPi/4IDgd5Tj
> ksmsxyXxWtGv/vEosJr+ZD7s1urPpkQ7DG6CT9wG9wj2ASq7sur/Eg7jfAnuIoTQ
> UvQenKXU0T+D+BZKpUiZs5e6VGya6bUzboAbPGiwsMH4/xj2IlOEjVAyf3ppnuiu
> /AW2LLqkFnbDB0IbveOu2+73CvVlahkaZ6nhPjkVpdpFw/SCAZHdkGdCafo8DKP8
> DKcmzta/QCEJ1uQUe7Rh8ndzYLzTaU0rqilA2WZUZvTx0gkviDGvQv/S97XP8lRJ
> SLn2xk166dxw0zpuIfzo0rr3S2Mz5PmAhrxiVxDG9ihaqBnABePspjp+cTXLhGhX
> 5zEhh1THiShjT03ZSPP8SEioQmj9LoQ9FH53/RXGmQ35O/nv4bAcvRvkqntFoF4Z
> iXE0bhQ2RyffQjBc70uJfdrpRbsmPqnNKSJ+60cB9y6jN+aQBuQdjB54ypu203mp
> x+yj7Fl+yf/EFbcs4aeAccAnx3J8uo6K1bKJmJtWrrBIIF28nNBrdBXGWh898rGe
> +m7teNKOm6WJXnuzASja82xJjul60WWOwAFLSOL1aAqo+At5Sps=
> =4xXe
> -END PGP SIGNATURE-
>



-- 
-- Guozhang


Re: [ANNOUNCE] Apache Kafka 0.11.0.3 Released

2018-07-03 Thread Ismael Juma
Thanks Matthias!

On Tue, 3 Jul 2018, 11:31 Matthias J. Sax,  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> The Apache Kafka community is pleased to announce the release for
> Apache Kafka 0.11.0.3.
>
>
> This is a bug fix release and it includes fixes and improvements from
> 27 JIRAs, including a few critical bugs.
>
>
> All of the changes in this release can be found in the release notes:
>
>
> https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/RELEASE_NOTES.
> html
> 
>
>
>
> You can download the source release from:
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka-0.11.0.
> 3-src.tgz
> 
>
>
> and binary releases from:
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.11-0.
> 11.0.3.tgz
> (Scala 2.11)
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.12-0.
> 11.0.3.tgz
> (Scala 2.12)
>
>
> - 
> - ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream records
> to one or more Kafka topics.
>
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming
> the input streams to output streams.
>
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.three key capabilities:
>
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
>
> Apache Kafka is in use at large and small companies worldwide,
> including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
> Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
> Zalando, among others.
>
>
>
> A big thank you for the following 26 contributors to this release!
>
>
> Matthias J. Sax, Ewen Cheslack-Postava, Konstantine Karantasis,
> Guozhang Wang, Rajini Sivaram, Randall Hauch, tedyu, Jagadesh
> Adireddi, Jarek Rudzinski, Jason Gustafson, Jeremy Custenborder, Anna
> Povzner, Joel Hamill, John Roesler, Max Zheng, Mickael Maison, Robert
> Yokota, Yaswanth Kumar, parafiend, Jiangjie (Becket) Qin, Arjun
> Satish, Bill Bejeck, Damian Guy, Gitomain, Gunnar Morling, Ismael Juma
>
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
>
> Thank you!
>
>
> Regards,
>  -Matthias
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls7wQAACgkQDccxaWtL
> g1+b/g/+LjM5gh8u2wCVz7dhOstwvtaajRG7cG1QhZH3H9QquVs19aKiE9ZcvEcK
> eJkX0S7rWopXs2qQxy5fVCTWGw5yO4eFNWuWxSIffuxH8/3K2sKahPi/4IDgd5Tj
> ksmsxyXxWtGv/vEosJr+ZD7s1urPpkQ7DG6CT9wG9wj2ASq7sur/Eg7jfAnuIoTQ
> UvQenKXU0T+D+BZKpUiZs5e6VGya6bUzboAbPGiwsMH4/xj2IlOEjVAyf3ppnuiu
> /AW2LLqkFnbDB0IbveOu2+73CvVlahkaZ6nhPjkVpdpFw/SCAZHdkGdCafo8DKP8
> DKcmzta/QCEJ1uQUe7Rh8ndzYLzTaU0rqilA2WZUZvTx0gkviDGvQv/S97XP8lRJ
> SLn2xk166dxw0zpuIfzo0rr3S2Mz5PmAhrxiVxDG9ihaqBnABePspjp+cTXLhGhX
> 5zEhh1THiShjT03ZSPP8SEioQmj9LoQ9FH53/RXGmQ35O/nv4bAcvRvkqntFoF4Z
> iXE0bhQ2RyffQjBc70uJfdrpRbsmPqnNKSJ+60cB9y6jN+aQBuQdjB54ypu203mp
> x+yj7Fl+yf/EFbcs4aeAccAnx3J8uo6K1bKJmJtWrrBIIF28nNBrdBXGWh898rGe
> +m7teNKOm6WJXnuzASja82xJjul60WWOwAFLSOL1aAqo+At5Sps=
> =4xXe
> -END PGP SIGNATURE-
>


Re: [ANNOUNCE] Apache Kafka 0.11.0.3 Released

2018-07-03 Thread Yishun Guan
Nice! Thanks~

On Tue, Jul 3, 2018, 12:16 PM Ismael Juma  wrote:

> Thanks Matthias!
>
> On Tue, 3 Jul 2018, 11:31 Matthias J. Sax,  wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > The Apache Kafka community is pleased to announce the release for
> > Apache Kafka 0.11.0.3.
> >
> >
> > This is a bug fix release and it includes fixes and improvements from
> > 27 JIRAs, including a few critical bugs.
> >
> >
> > All of the changes in this release can be found in the release notes:
> >
> >
> > https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/RELEASE_NOTES.
> > html
> > <
> https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/RELEASE_NOTES.html
> >
> >
> >
> >
> > You can download the source release from:
> >
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka-0.11.0.
> > 3-src.tgz
> > <
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka-0.11.0.3-src.tgz
> >
> >
> >
> > and binary releases from:
> >
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.11-0.
> > 11.0.3.tgz
> > (Scala 2.11)
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.3/kafka_2.12-0.
> > 11.0.3.tgz
> > (Scala 2.12)
> >
> >
> > -
> 
> > - ---
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> > ** The Producer API allows an application to publish a stream records
> > to one or more Kafka topics.
> >
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output stream to one or more output topics, effectively transforming
> > the input streams to output streams.
> >
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might
> > capture every change to a table.three key capabilities:
> >
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> >
> > ** Building real-time streaming applications that transform or react
> > to the streams of data.
> >
> >
> >
> > Apache Kafka is in use at large and small companies worldwide,
> > including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
> > Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
> > Zalando, among others.
> >
> >
> >
> > A big thank you for the following 26 contributors to this release!
> >
> >
> > Matthias J. Sax, Ewen Cheslack-Postava, Konstantine Karantasis,
> > Guozhang Wang, Rajini Sivaram, Randall Hauch, tedyu, Jagadesh
> > Adireddi, Jarek Rudzinski, Jason Gustafson, Jeremy Custenborder, Anna
> > Povzner, Joel Hamill, John Roesler, Max Zheng, Mickael Maison, Robert
> > Yokota, Yaswanth Kumar, parafiend, Jiangjie (Becket) Qin, Arjun
> > Satish, Bill Bejeck, Damian Guy, Gitomain, Gunnar Morling, Ismael Juma
> >
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > http://kafka.apache.org/
> >
> >
> > Thank you!
> >
> >
> > Regards,
> >  -Matthias
> > -BEGIN PGP SIGNATURE-
> > Comment: GPGTools - https://gpgtools.org
> >
> > iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls7wQAACgkQDccxaWtL
> > g1+b/g/+LjM5gh8u2wCVz7dhOstwvtaajRG7cG1QhZH3H9QquVs19aKiE9ZcvEcK
> > eJkX0S7rWopXs2qQxy5fVCTWGw5yO4eFNWuWxSIffuxH8/3K2sKahPi/4IDgd5Tj
> > ksmsxyXxWtGv/vEosJr+ZD7s1urPpkQ7DG6CT9wG9wj2ASq7sur/Eg7jfAnuIoTQ
> > UvQenKXU0T+D+BZKpUiZs5e6VGya6bUzboAbPGiwsMH4/xj2IlOEjVAyf3ppnuiu
> > /AW2LLqkFnbDB0IbveOu2+73CvVlahkaZ6nhPjkVpdpFw/SCAZHdkGdCafo8DKP8
> > DKcmzta/QCEJ1uQUe7Rh8ndzYLzTaU0rqilA2WZUZvTx0gkviDGvQv/S97XP8lRJ
> > SLn2xk166dxw0zpuIfzo0rr3S2Mz5PmAhrxiVxDG9ihaqBnABePspjp+cTXLhGhX
> > 5zEhh1THiShjT03ZSPP8SEioQmj9LoQ9FH53/RXGmQ35O/nv4bAcvRvkqntFoF4Z
> > iXE0bhQ2RyffQjBc70uJfdrpRbsmPqnNKSJ+60cB9y6jN+aQBuQdjB54ypu203mp
> > x+yj7Fl+yf/EFbcs4aeAccAnx3J8uo6K1bKJmJtWrrBIIF28nNBrdBXGWh898rGe
> > +m7teNKOm6WJXnuzASja82xJjul60WWOwAFLSOL1aAqo+At5Sps=
> > =4xXe
> > -END PGP SIGNATURE-
> >
>


[ANNOUNCE] Apache Kafka 0.10.2.2 Released

2018-07-03 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

The Apache Kafka community is pleased to announce the release for
Apache Kafka 0.10.2.2.


This is a bug fix release and it includes fixes and improvements from
29 JIRAs, including a few critical bugs.


All of the changes in this release can be found in the release notes:


https://dist.apache.org/repos/dist/release/kafka/0.10.2.2/RELEASE_NOTES.
html



You can download the source release from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.2/kafka-0.10.2.
2-src.tgz


and binary releases from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.2/kafka_2.11-0.
10.2.2.tgz
(Scala 2.11)

https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.2/kafka_2.12-0.
10.2.2.tgz
(Scala 2.12)


- 
- ---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records
to one or more Kafka topics.


** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.


** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming
the input streams to output streams.


** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.three key capabilities:



With these APIs, Kafka can be used for two broad classes of application:


** Building real-time streaming data pipelines that reliably get data
between systems or applications.


** Building real-time streaming applications that transform or react
to the streams of data.



Apache Kafka is in use at large and small companies worldwide,
including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
Zalando, among others.



A big thank you for the following 30 contributors to this release!


Ewen Cheslack-Postava, Matthias J. Sax, Randall Hauch, Eno Thereska,
Damian Guy, Rajini Sivaram, Colin P. Mccabe, Kelvin Rutt, Kyle
Winkelman, Max Zheng, Guozhang Wang, Xavier Léauté, Konstantine
Karantasis, Paolo Patierno, Robert Yokota, Tommy Becker, Arjun Satish,
Xi Hu, Armin Braun, Edoardo Comar, Gunnar Morling, Gwen Shapira,
Hooman Broujerdi, Ismael Juma, Jaikiran Pai, Jarek Rudzinski, Jason
Gustafson, Jun Rao, Manikumar Reddy, Maytee Chinavanichkit


We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/


Thank you!


Regards,
 -Matthias


-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls70woACgkQDccxaWtL
g1+Xzw//Rb7K691p0R2qPOixZfllEuO926C9dIjiq9XA+dZrabgC4tMgAtE07Pf4
i6ZUeIqVLH3IDYIKji92K+JUIWpu6fdmCc999bJUOJG+zABMbO0uRYm7/4LwfMPR
kfjxRhxu31ewvafs3crE4Kfkekw4FLFIwHiaz3i/mKC1Ty6V4oiJcwHP4PZizE2r
rTNbt0ZHzviiBH3klOoDh+ZZFwbDZn7EHUXm8o9fiiC52o/7TIqVWwmNzZJlNGRc
bxC3boGXAXjgBwm7iqxBgkPku/kTTWpxj6jkHbS2NQfCZE5V7INQC2HlnynPHc7j
m2F2plSvKOm4gi54q6SSiXkjcXA2dBJDe3y/jNpckXSQ31sNXsTi6vbRMkMPj8dJ
j0SKhFoSCDpWejgLkUMg6hZgepgz7G1uYHA9K8SfCyCooqxsEY4I3dClNOySORly
4brdjZWpclhCn+zpekqBFZ9Sn3ipG4MOvH64chPEvYnysHkRH26FqXNPOK185V0Z
Czl0dL0aEoJWZ3LxLTSoFkncKgqrcE00q4VknK3zGW65tlQ1DqTXtK3Ta1q8vX98
PCCR4Tjhu0RcBAV2L4o43itKzIaLCp9lElA1341oQUB+tiPRA0GvWGg36EomehzF
1qdbjBug91CLyefZVVeEfTiqmNAYNyR1Zmx99rryx+Fp+5Ek9YI=
=yjnJ
-END PGP SIGNATURE-


Build failed in Jenkins: kafka-trunk-jdk10 #275

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] MINOR: Include quota related interfaces to javadocs (#5325)

--
[...truncated 1.98 MB...]
org.apache.kafka.streams.StreamsConfigTest > shouldAcceptExactlyOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
testGetGlobalConsumerConfigsWithGlobalConsumerOverridenPrefix STARTED

org.apache.kafka.streams.StreamsConfigTest > 
testGetGlobalConsumerConfigsWithGlobalConsumerOverridenPrefix PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetInternalLeaveGroupOnCloseConfigToFalseInConsumer STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetInternalLeaveGroupOnCloseConfigToFalseInConsumer PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfConsumerConfig STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedPropertiesThatAreNotPartOfConsumerConfig PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldForwardCustomConfigsWithNoPrefixToAllClients STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldForwardCustomConfigsWithNoPrefixToAllClients PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfRestoreConsumerAutoCommitIsOverridden STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfRestoreConsumerAutoCommitIsOverridden PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnError STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyCorrectValueSerdeClassOnError PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfGlobalConsumerAutoCommitIsOverridden STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfGlobalConsumerAutoCommitIsOverridden PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerIsolationLevelIsOverriddenIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerIsolationLevelIsOverriddenIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedGlobalConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedGlobalConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedAdminConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedAdminConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigCommitIntervalMsIfExactlyOnceEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigCommitIntervalMsIfExactlyOnceEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyNoOptimizationWhenNotExplicitlyAddedToConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSpecifyNoOptimizationWhenNotExplicitlyAddedToConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowConfigExceptionWhenOptimizationConfigNotValueInRange STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowConfigExceptionWhenOptimizationConfigNotValueInRange PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfKeySerdeConfigFails STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfKeySerdeConfigFails PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportNonPrefixedProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetRestoreConsumerConfigs 
STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetRestoreConsumerConfigs 
PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowExceptionIfBootstrapServersIsNotSet STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowExceptionIfBootstrapServersIsNotSet PASSED

org.apache.kafka.streams.StreamsConfigTest > 
consumerConfigMustUseAdminClientConfigForRetries STARTED

org.apache.kafka.streams.StreamsConfigTest > 
consumerConfigMustUseAdminClientConfigForRetries PASSED

org.apache.kafka.streams.StreamsConfigTest > 
testGetMainConsumerConfigsWithMainConsumerOverridenPrefix STARTED

org.apache.kafka.streams.StreamsConfigTest > 
testGetMainConsumerConfigsWithMainConsumerOverridenPrefix PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedRestoreConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldBeSupportNonPrefixedRestoreConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedRestoreConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedRestoreConsumerConfigs PASSED

[jira] [Resolved] (KAFKA-2933) Failure in kafka.api.PlaintextConsumerTest.testMultiConsumerDefaultAssignment

2018-07-03 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-2933.

Resolution: Cannot Reproduce

It's been a long time since we've seen this, so I will close. We can reopen if 
it reoccurs.

> Failure in kafka.api.PlaintextConsumerTest.testMultiConsumerDefaultAssignment
> -
>
> Key: KAFKA-2933
> URL: https://issues.apache.org/jira/browse/KAFKA-2933
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.10.0.0
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: transient-unit-test-failure
>
> {code}
> kafka.api.PlaintextConsumerTest > testMultiConsumerDefaultAssignment FAILED
> java.lang.AssertionError: Did not get valid assignment for partitions 
> [topic1-2, topic2-0, topic1-4, topic-1, topic-0, topic2-1, topic1-0, 
> topic1-3, topic1-1, topic2-2] after we changed subscription
> at org.junit.Assert.fail(Assert.java:88)
> at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:747)
> at 
> kafka.api.PlaintextConsumerTest.validateGroupAssignment(PlaintextConsumerTest.scala:644)
> at 
> kafka.api.PlaintextConsumerTest.changeConsumerGroupSubscriptionAndValidateAssignment(PlaintextConsumerTest.scala:663)
> at 
> kafka.api.PlaintextConsumerTest.testMultiConsumerDefaultAssignment(PlaintextConsumerTest.scala:461)
> java.util.ConcurrentModificationException: KafkaConsumer is not safe for 
> multi-threaded access
> {code}
> Example: https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/1582/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-03 Thread Jun Rao
Hi, Lucas, Dong,

If all disks on a broker are slow, one probably should just kill the
broker. In that case, this KIP may not help. If only one of the disks on a
broker is slow, one may want to fail that disk and move the leaders on that
disk to other brokers. In that case, being able to process the LeaderAndIsr
requests faster will potentially help the producers recover quicker.

Thanks,

Jun

On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin  wrote:

> Hey Lucas,
>
> Thanks for the reply. Some follow up questions below.
>
> Regarding 1, if each ProduceRequest covers 20 partitions that are randomly
> distributed across all partitions, then each ProduceRequest will likely
> cover some partitions for which the broker is still leader after it quickly
> processes the
> LeaderAndIsrRequest. Then broker will still be slow in processing these
> ProduceRequest and request will still be very high with this KIP. It seems
> that most ProduceRequest will still timeout after 30 seconds. Is this
> understanding correct?
>
> Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> then it is less clear how this KIP reduces average produce latency. Can you
> clarify what metrics can be improved by this KIP?
>
> Not sure why system operator directly cares number of truncated messages.
> Do you mean this KIP can improve average throughput or reduce message
> duplication? It will be good to understand this.
>
> Thanks,
> Dong
>
>
>
>
>
> On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang  wrote:
>
> > Hi Dong,
> >
> > Thanks for your valuable comments. Please see my reply below.
> >
> > 1. The Google doc showed only 1 partition. Now let's consider a more
> common
> > scenario
> > where broker0 is the leader of many partitions. And let's say for some
> > reason its IO becomes slow.
> > The number of leader partitions on broker0 is so large, say 10K, that the
> > cluster is skewed,
> > and the operator would like to shift the leadership for a lot of
> > partitions, say 9K, to other brokers,
> > either manually or through some service like cruise control.
> > With this KIP, not only will the leadership transitions finish more
> > quickly, helping the cluster itself becoming more balanced,
> > but all existing producers corresponding to the 9K partitions will get
> the
> > errors relatively quickly
> > rather than relying on their timeout, thanks to the batched async ZK
> > operations.
> > To me it's a useful feature to have during such troublesome times.
> >
> >
> > 2. The experiments in the Google Doc have shown that with this KIP many
> > producers
> > receive an explicit error NotLeaderForPartition, based on which they
> retry
> > immediately.
> > Therefore the latency (~14 seconds+quick retry) for their single message
> is
> > much smaller
> > compared with the case of timing out without the KIP (30 seconds for
> timing
> > out + quick retry).
> > One might argue that reducing the timing out on the producer side can
> > achieve the same result,
> > yet reducing the timeout has its own drawbacks[1].
> >
> > Also *IF* there were a metric to show the number of truncated messages on
> > brokers,
> > with the experiments done in the Google Doc, it should be easy to see
> that
> > a lot fewer messages need
> > to be truncated on broker0 since the up-to-date metadata avoids appending
> > of messages
> > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > whether
> > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> >
> > 3. To answer your question, I think it might be helpful to construct some
> > formulas.
> > To simplify the modeling, I'm going back to the case where there is only
> > ONE partition involved.
> > Following the experiments in the Google Doc, let's say broker0 becomes
> the
> > follower at time t0,
> > and after t0 there were still N produce requests in its request queue.
> > With the up-to-date metadata brought by this KIP, broker0 can reply with
> an
> > NotLeaderForPartition exception,
> > let's use M1 to denote the average processing time of replying with such
> an
> > error message.
> > Without this KIP, the broker will need to append messages to segments,
> > which may trigger a flush to disk,
> > let's use M2 to denote the average processing time for such logic.
> > Then the average extra latency incurred without this KIP is N * (M2 -
> M1) /
> > 2.
> >
> > In practice, M2 should always be larger than M1, which means as long as N
> > is positive,
> > we would see improvements on the average latency.
> > There does not need to be significant backlog of requests in the request
> > queue,
> > or severe degradation of disk performance to have the improvement.
> >
> > Regards,
> > Lucas
> >
> >
> > [1] For instance, reducing the timeout on the producer side can trigger
> > unnecessary duplicate requests
> > when the corresponding leader broker is overloaded, exacerbating the
> > situation.
> >
> > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin  wrote:
> >
> > > Hey Lucas,
> 

Re: [ANNOUNCE] Apache Kafka 0.11.0.3 Released

2018-07-03 Thread Jason Gustafson
Awesome. Thanks Matthias!

On Tue, Jul 3, 2018 at 12:44 PM, Yishun Guan  wrote:

> Nice! Thanks~
>
> On Tue, Jul 3, 2018, 12:16 PM Ismael Juma  wrote:
>
> > Thanks Matthias!
> >
> > On Tue, 3 Jul 2018, 11:31 Matthias J. Sax,  wrote:
> >
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA512
> > >
> > > The Apache Kafka community is pleased to announce the release for
> > > Apache Kafka 0.11.0.3.
> > >
> > >
> > > This is a bug fix release and it includes fixes and improvements from
> > > 27 JIRAs, including a few critical bugs.
> > >
> > >
> > > All of the changes in this release can be found in the release notes:
> > >
> > >
> > > https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/
> RELEASE_NOTES.
> > > html
> > > <
> > https://dist.apache.org/repos/dist/release/kafka/0.11.0.3/
> RELEASE_NOTES.html
> > >
> > >
> > >
> > >
> > > You can download the source release from:
> > >
> > >
> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 3/kafka-0.11.0.
> > > 3-src.tgz
> > > <
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 3/kafka-0.11.0.3-src.tgz
> > >
> > >
> > >
> > > and binary releases from:
> > >
> > >
> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 3/kafka_2.11-0.
> > > 11.0.3.tgz
> > > (Scala 2.11)
> > >
> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.
> 3/kafka_2.12-0.
> > > 11.0.3.tgz
> > > (Scala 2.12)
> > >
> > >
> > > -
> > 
> > > - ---
> > >
> > >
> > > Apache Kafka is a distributed streaming platform with four core APIs:
> > >
> > >
> > > ** The Producer API allows an application to publish a stream records
> > > to one or more Kafka topics.
> > >
> > >
> > > ** The Consumer API allows an application to subscribe to one or more
> > > topics and process the stream of records produced to them.
> > >
> > >
> > > ** The Streams API allows an application to act as a stream processor,
> > > consuming an input stream from one or more topics and producing an
> > > output stream to one or more output topics, effectively transforming
> > > the input streams to output streams.
> > >
> > >
> > > ** The Connector API allows building and running reusable producers or
> > > consumers that connect Kafka topics to existing applications or data
> > > systems. For example, a connector to a relational database might
> > > capture every change to a table.three key capabilities:
> > >
> > >
> > >
> > > With these APIs, Kafka can be used for two broad classes of
> application:
> > >
> > >
> > > ** Building real-time streaming data pipelines that reliably get data
> > > between systems or applications.
> > >
> > >
> > > ** Building real-time streaming applications that transform or react
> > > to the streams of data.
> > >
> > >
> > >
> > > Apache Kafka is in use at large and small companies worldwide,
> > > including Capital One, Goldman Sachs, ING, LinkedIn, Netflix,
> > > Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and
> > > Zalando, among others.
> > >
> > >
> > >
> > > A big thank you for the following 26 contributors to this release!
> > >
> > >
> > > Matthias J. Sax, Ewen Cheslack-Postava, Konstantine Karantasis,
> > > Guozhang Wang, Rajini Sivaram, Randall Hauch, tedyu, Jagadesh
> > > Adireddi, Jarek Rudzinski, Jason Gustafson, Jeremy Custenborder, Anna
> > > Povzner, Joel Hamill, John Roesler, Max Zheng, Mickael Maison, Robert
> > > Yokota, Yaswanth Kumar, parafiend, Jiangjie (Becket) Qin, Arjun
> > > Satish, Bill Bejeck, Damian Guy, Gitomain, Gunnar Morling, Ismael Juma
> > >
> > >
> > > We welcome your help and feedback. For more information on how to
> > > report problems, and to get involved, visit the project website at
> > > http://kafka.apache.org/
> > >
> > >
> > > Thank you!
> > >
> > >
> > > Regards,
> > >  -Matthias
> > > -BEGIN PGP SIGNATURE-
> > > Comment: GPGTools - https://gpgtools.org
> > >
> > > iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAls7wQAACgkQDccxaWtL
> > > g1+b/g/+LjM5gh8u2wCVz7dhOstwvtaajRG7cG1QhZH3H9QquVs19aKiE9ZcvEcK
> > > eJkX0S7rWopXs2qQxy5fVCTWGw5yO4eFNWuWxSIffuxH8/3K2sKahPi/4IDgd5Tj
> > > ksmsxyXxWtGv/vEosJr+ZD7s1urPpkQ7DG6CT9wG9wj2ASq7sur/Eg7jfAnuIoTQ
> > > UvQenKXU0T+D+BZKpUiZs5e6VGya6bUzboAbPGiwsMH4/xj2IlOEjVAyf3ppnuiu
> > > /AW2LLqkFnbDB0IbveOu2+73CvVlahkaZ6nhPjkVpdpFw/SCAZHdkGdCafo8DKP8
> > > DKcmzta/QCEJ1uQUe7Rh8ndzYLzTaU0rqilA2WZUZvTx0gkviDGvQv/S97XP8lRJ
> > > SLn2xk166dxw0zpuIfzo0rr3S2Mz5PmAhrxiVxDG9ihaqBnABePspjp+cTXLhGhX
> > > 5zEhh1THiShjT03ZSPP8SEioQmj9LoQ9FH53/RXGmQ35O/nv4bAcvRvkqntFoF4Z
> > > iXE0bhQ2RyffQjBc70uJfdrpRbsmPqnNKSJ+60cB9y6jN+aQBuQdjB54ypu203mp
> > > x+yj7Fl+yf/EFbcs4aeAccAnx3J8uo6K1bKJmJtWrrBIIF28nNBrdBXGWh898rGe
> > > +m7teNKOm6WJXnuzASja82xJjul60WWOwAFLSOL1aAqo+At5Sps=
> > > =4xXe
> > > -END PGP SIGNATURE-
> > >
> >
>


Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-03 Thread Jason Gustafson
Sorry to join the discussion late. Can you you add to the motivation the
use cases for header-based compaction. This seems not very clear to me.

Thanks,
Jason

On Mon, Jul 2, 2018 at 11:00 AM, Guozhang Wang  wrote:

> Hi Luis,
>
> I believe that compaction property is indeed overridable at per-topic
> level, as in
>
> https://github.com/apache/kafka/blob/0cacbcf30e0a90ab9fad7bc310e547
> 7cf959f1fd/clients/src/main/java/org/apache/kafka/common/
> config/TopicConfig.java#L116
>
> And also documented in https://kafka.apache.org/
> documentation/#topicconfigs
>
> Is that right?
>
>
>
> Guozhang
>
> On Mon, Jul 2, 2018 at 7:41 AM, Luís Cabral  >
> wrote:
>
> >  Hi Guozhang,
> >
> > You are right that it is not straightforward to add a dependent property
> > validation.
> > Though it is possible to re-design it to allow for this, that effort
> would
> > be better placed under its own KIP, if it really becomes useful for other
> > properties as well.
> > Given this, the fallback-to-offset behaviour currently documented will be
> > used.
> >
> > Also, while analyzing this, I noticed that the existing compaction
> > properties only exist globally, and not per topic.
> > I don't understand why this is, but it again feels like something out of
> > scope for this KIP.
> > Given this, the KIP was updated to only include the global configuration
> > properties, removing the per-topic configs.
> >
> > I'll soon update the PR according to the documentation, but I trust the
> > KIP doesn't need that to close, right?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 2:00:08 PM GMT+2, Luís Cabral
> >  wrote:
> >
> >   Hi Guozhang,
> >
> > At the moment the KIP has your vote, Matthias' and Ted's.
> > Should I ask someone else to have a look?
> >
> > Cheers,
> > Luis
> >
> > On Monday, July 2, 2018, 12:16:48 PM GMT+2, Mickael Maison <
> > mickael.mai...@gmail.com> wrote:
> >
> >  +1 (non binding). Thanks for the KIP!
> >
> > On Sat, Jun 30, 2018 at 12:26 AM, Guozhang Wang 
> > wrote:
> > > Hi Luis,
> > >
> > > Regarding the minor suggest, I agree it would be better to make it as
> > > mandatory, but it might be a bit tricky because it is a conditional
> > > mandatory one depending on the other config's value. Would like to see
> > your
> > > updated PR.
> > >
> > > Regarding the KIP itself, both Matthias and myself can recast our votes
> > to
> > > the updated wiki, while we still need one more committer to vote
> > according
> > > to the bylaws.
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Jun 28, 2018 at 5:38 AM, Luís Cabral
> > 
> > > wrote:
> > >
> > >>  Hi,
> > >>
> > >> Thank you all for having a look!
> > >>
> > >> The KIP is now updated with the result of these late discussions,
> > though I
> > >> did take some liberty with this part:
> > >>
> > >>
> > >>- If the "compaction.strategy.header" configuration is not set (or
> is
> > >> blank), then the compaction strategy will fallback to "offset";
> > >>
> > >>
> > >> Alternatively, we can also set it to be a mandatory property when the
> > >> strategy is "header" and fail the application to start via a config
> > >> validation (I would honestly prefer this, but its up to your taste).
> > >>
> > >> Anyway, this is now a minute detail that can be adapted during the
> final
> > >> stage of this KIP, so are you all alright with me changing the status
> to
> > >> [ACCEPTED]?
> > >>
> > >> Cheers,
> > >> Luis
> > >>
> > >>
> > >>On Thursday, June 28, 2018, 2:08:11 PM GMT+2, Ted Yu <
> > >> yuzhih...@gmail.com> wrote:
> > >>
> > >>  +1
> > >>
> > >> On Thu, Jun 28, 2018 at 4:56 AM, Luís Cabral
> >  > >> >
> > >> wrote:
> > >>
> > >> > Hi Ted,
> > >> > Can I also get your input on this?
> > >> >
> > >> > bq. +1 from my side for using `compaction.strategy` with values
> > >> > "offset","timestamp" and "header" and `compaction.strategy.header`
> > >> > -Matthias
> > >> >
> > >> > bq. +1 from me as well.
> > >> > -Guozhang
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Luis
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>


Build failed in Jenkins: kafka-2.0-jdk8 #69

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] MINOR: Include quota related interfaces to javadocs (#5325)

--
[...truncated 869.07 KB...]
kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsNotExistingGroup 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsNotExistingGroup 
PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToZonedDateTime 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToZonedDateTime 
PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToEarliest STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToEarliest PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsExportImportPlan 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsExportImportPlan 
PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToSpecificOffset 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToSpecificOffset 
PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsShiftPlus STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsShiftPlus PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToLatest STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToLatest PASSED

kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsShiftByLowerThanEarliest STARTED

kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsShiftByLowerThanEarliest PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsByDuration STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsByDuration PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToLocalDateTime 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToLocalDateTime 
PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsExistingTopic STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsExistingTopic PASSED

kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsToEarliestOnTopicsAndPartitions STARTED

kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsToEarliestOnTopicsAndPartitions PASSED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToEarliestOnTopics 
STARTED

kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsToEarliestOnTopics 
PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > shouldFailIfBlankArg STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > shouldFailIfBlankArg PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowVerifyWithoutReassignmentOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowVerifyWithoutReassignmentOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithoutBrokersOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithoutBrokersOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowTopicsOptionWithVerify STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowTopicsOptionWithVerify PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithThrottleOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithThrottleOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > shouldFailIfNoArgs STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > shouldFailIfNoArgs PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowExecuteWithoutReassignmentOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowExecuteWithoutReassignmentOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowBrokersListWithVerifyOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowBrokersListWithVerifyOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldCorrectlyParseValidMinimumExecuteOptions STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldCorrectlyParseValidMinimumExecuteOptions PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithReassignmentOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithReassignmentOption PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldCorrectlyParseValidMinimumGenerateOptions STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldCorrectlyParseValidMinimumGenerateOptions PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithoutBrokersAndTopicsOptions STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowGenerateWithoutBrokersAndTopicsOptions PASSED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowThrottleWithVerifyOption STARTED

kafka.admin.ReassignPartitionsCommandArgsTest > 
shouldNotAllowThrottleWithVerifyOption PASSED

kafka.adm

Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-03 Thread Harsha
+1.

Thanks,
Harsha

On Tue, Jul 3rd, 2018 at 9:22 AM, Ted Yu  wrote:

> 
> 
> 
> +1
> 
> On Tue, Jul 3, 2018 at 9:05 AM, Mickael Maison < mickael.mai...@gmail.com >
> 
> wrote:
> 
> > +1 (non binding)
> > Thanks for the KIP
> >
> > On Tue, Jul 3, 2018 at 4:59 PM, Vahid S Hashemian
> > < vahidhashem...@us.ibm.com > wrote:
> > > +1 (non-binding)
> > >
> > > --Vahid
> > >
> > >
> > >
> > > From: Gwen Shapira < g...@confluent.io >
> > > To: dev < dev@kafka.apache.org >
> > > Date: 07/03/2018 08:49 AM
> > > Subject: Re: [VOTE] KIP-322: Return new error code for
> > DeleteTopics
> > > API when topic deletion disabled.
> > >
> > >
> > >
> > > +1
> > >
> > > On Tue, Jul 3, 2018 at 8:24 AM, Manikumar < manikumar.re...@gmail.com >
> 
> > > wrote:
> > >
> > >> Manikumar < manikumar.re...@gmail.com >
> > >> Fri, Jun 29, 7:59 PM (4 days ago)
> > >> to dev
> > >> Hi All,
> > >>
> > >> I would like to start voting on KIP-322 which would return new error
> > > code
> > >> for DeleteTopics API when topic deletion disabled.
> > >>
> > >>
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=87295558
> > >
> > >>
> > >> Thanks,
> > >>
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter <
> > > https://twitter.com/ConfluentInc
> > >> | blog
> > > <
> > > http://www.confluent.io/blog
> > >>
> > >
> > >
> > >
> > >
> >
> 
> 
> 
>

Re: [VOTE] KIP-280: Enhanced log compaction

2018-07-03 Thread Jun Rao
Hi, Luis,

Thanks for the KIP. Overall, this seems a useful KIP. A few comments below.

1. I guess both new configurations will be at the topic level?
2. Since the log cleaner now needs to keep both the offset and another long
(say timestamp) in the de-dup map, it reduces the number of keys that we
can keep in the map and thus may require more rounds of cleaning. This is
probably not a big issue, but it will be useful to document this impact in
the KIP.
3. With the new cleaning strategy, we want to be a bit careful with
removing the last message in a partition (which is possible now). We need
to preserve the offset of the last message so that we don't reuse the
offset for a different message. One way to simply never remove the last
message. Another way (suggested by Jason) is to create an empty message
batch.

Jun

On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral 
wrote:

> Hi all,
>
> Any takers on having a look at this KIP and voting on it?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compaction
>
> Cheers,
> Luis
>


Build failed in Jenkins: kafka-0.11.0-jdk7 #389

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Fix kafkatest snapshot version for 0.11.0.4

--
[...truncated 1.56 MB...]
org.apache.kafka.streams.kstream.internals.KStreamImplTest > 
shouldNotAllowNullFilePathOnWriteAsText PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testAsymetricWindowingBefore STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testAsymetricWindowingBefore PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testAsymetricWindowingAfter STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testAsymetricWindowingAfter PASSED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testAggBasic STARTED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testAggBasic PASSED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testJoin STARTED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testJoin PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldConvertToBinaryAndBack STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldConvertToBinaryAndBack PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractStartTimeFromBinary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractStartTimeFromBinary PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldDeSerializeNullToNull STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldDeSerializeNullToNull PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldSerializeDeserialize STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldSerializeDeserialize PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldDeSerializeEmtpyByteArrayToNull STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldDeSerializeEmtpyByteArrayToNull PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractWindowFromBindary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractWindowFromBindary PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractKeyFromBinary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractKeyFromBinary PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractKeyBytesFromBinary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractKeyBytesFromBinary PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractEndTimeFromBinary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractEndTimeFromBinary PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldSerializeNullToNull STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldSerializeNullToNull PASSED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractBytesKeyFromBinary STARTED

org.apache.kafka.streams.kstream.internals.SessionKeySerdeTest > 
shouldExtractBytesKeyFromBinary PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
shouldForwardToCorrectProcessorNodeWhenMultiCacheEvictions STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
shouldForwardToCorrectProcessorNodeWhenMultiCacheEvictions PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testCount 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountWithInternalStore STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountWithInternalStore PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggCoalesced STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggCoalesced PASSED

org.apache.kafka.stream

Jenkins build is back to normal : kafka-trunk-jdk8 #2790

2018-07-03 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-03 Thread Dong Lin
Hey Jason,

Thanks much for your thoughtful explanation.

Yes the solution using findOffsets(offset, leaderEpoch) also works. The
advantage of this solution it adds only one API instead of two APIs. The
concern is that its usage seems a bit more clumsy for advanced users. More
specifically, advanced users who store offsets externally will always need
to call findOffsets() before calling seek(offset) during consumer
initialization. And those advanced users will need to manually keep track
of the leaderEpoch of the last ConsumerRecord.

The other solution may be more user-friendly for advanced users is to add
two APIs, `void seek(offset, leaderEpoch)` and `(offset, epoch) =
offsetEpochs(topicPartition)`.

I kind of prefer the second solution because it is easier to use for
advanced users. If we need to expose leaderEpoch anyway to safely identify
a message, it may be conceptually simpler to expose it directly in
seek(...) rather than requiring one more translation using
findOffsets(...). But I am also OK with the first solution if other
developers also favor that one :)

Thanks,
Dong


On Mon, Jul 2, 2018 at 11:10 AM, Jason Gustafson  wrote:

> Hi Dong,
>
> Thanks, I've been thinking about your suggestions a bit. It is challenging
> to make this work given the current APIs. One of the difficulties is that
> we don't have an API to find the leader epoch for a given offset at the
> moment. So if the user does a seek to offset 5, then we'll need a new API
> to find the corresponding epoch in order to fulfill the new position() API.
> Potentially we could modify ListOffsets to enable finding the leader epoch,
> but I am not sure it is worthwhile. Perhaps it is reasonable for advanced
> usage to expect that the epoch information, if needed, will be extracted
> from the records directly? It might make sense to expose a helper in
> `ConsumerRecords` to make this a little easier though.
>
> Alternatively, if we think it is important to have this information exposed
> directly, we could create batch APIs to solve the naming problem. For
> example:
>
> Map positions();
> void seek(Map positions);
>
> However, I'm actually leaning toward leaving the seek() and position() APIs
> unchanged. Instead, we can add a new API to search for offset by timestamp
> or by offset/leader epoch. Let's say we call it `findOffsets`. If the user
> hits a log truncation error, they can use this API to find the closest
> offset and then do a seek(). At the same time, we deprecate the
> `offsetsForTimes` APIs. We now have two use cases which require finding
> offsets, so I think we should make this API general and leave the door open
> for future extensions.
>
> By the way, I'm unclear about the desire to move part of this functionality
> to AdminClient. Guozhang suggested this previously, but I think it only
> makes sense for cross-cutting capabilities such as topic creation. If we
> have an API which is primarily useful by consumers, then I think that's
> where it should be exposed. The AdminClient also has its own API integrity
> and should not become a dumping ground for advanced use cases. I'll update
> the KIP with the  `findOffsets` API suggested above and we can see if it
> does a good enough job of keeping the API simple for common cases.
>
> Thanks,
> Jason
>
>
> On Sat, Jun 30, 2018 at 4:39 AM, Dong Lin  wrote:
>
> > Hey Jason,
> >
> > Regarding seek(...), it seems that we want an API for user to initialize
> > consumer with (offset, leaderEpoch) and that API should allow throwing
> > PartitionTruncationException. Suppose we agree on this, then
> > seekToNearest() is not sufficient because it will always swallow
> > PartitionTruncationException. Here we have two options. The first option
> is
> > to add API offsetsForLeaderEpochs() to translate (leaderEpoch, offset) to
> > offset. The second option is to have add seek(offset, leaderEpoch). It
> > seems that second option may be more simpler because it makes it clear
> that
> > (offset, leaderEpoch) will be used to identify consumer's position in a
> > partition. And user only needs to handle PartitionTruncationException
> from
> > the poll(). In comparison the first option seems a bit harder to use
> > because user have to also handle the PartitionTruncationException if
> > offsetsForLeaderEpochs() returns different offset from user-provided
> > offset. What do you think?
> >
> > If we decide to add API seek(offset, leaderEpoch), then we can decide
> > whether and how to add API to translate (offset, leaderEpoch) to offset.
> It
> > seems that this API will be needed by advanced user to don't want auto
> > offset reset (so that it can be notified) but still wants to reset offset
> > to closest. For those users if probably makes sense to only have the API
> in
> > AdminClient. offsetsForTimes() seems like a common API that will be
> needed
> > by user's of consumer in general, so it may be more reasonable to stay in
> > the consumer API. I don't have a strong opinion on whether
> 

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-07-03 Thread Lucas Wang
@Ted
For #1, it's probably hard to predict M since it also depends on the
hardware.
I'm not sure how to use the suggested formula for the default value if we
don't know M.
Also TO is the default timeout we want to figure out, and the formula seems
to be recursive.
I'd suggest we stay with the current default value of 300 milliseconds, and
address it separately
if it turns out to be a problem. What do you think?

#2, please try this link and see if it works now:
https://drive.google.com/file/d/1QbPDqfT59A2X4To2p3OfD5YeJR8aWDK7/view?usp=sharing

Regards,
Lucas


On Mon, Jul 2, 2018 at 5:52 PM, Ted Yu  wrote:

> For #1, I don't know what would be good approximation for M.
> Maybe use max((TO / 2) / N, M / N) as default value for poll timeout ?
>
> For #2, I don't see the picture in email :-)
> Can you use third party website ?
>
> Thanks
>
> On Mon, Jul 2, 2018 at 5:17 PM, Lucas Wang  wrote:
>
> > Hi Ted,
> >
> > 1. I'm neutral on making the poll timeout parameter configurable.
> > Mainly because as a config, it could be confusing for operators who try
> to
> > choose a value for it.
> >
> > To understand the implication of this value better,
> > let's use TO to represent the timeout value under discussion,
> > M to denote the processing time of data requests,
> > and N to be the number of io threads.
> >
> > - If the data request queue is empty and there is no incoming data
> > requests,
> >   all io threads should be blocked on the data request queue, and
> >   the average delay for a controller request is (TO / 2) / N, and the
> > worst case delay is TO.
> > - If all IO threads are busy processing data requests, then the average
> > latency for a controller request is M / N.
> > - In the worst case, a controller request can just miss the train, and IO
> > threads get blocked on data request queue
> >   for TO, at the end of which they all receive a new incoming data
> > request, the latency for the
> >   controller request can be TO + M.
> >
> > Given the intricacies, what do you think about choosing a relatively
> > meaningful value and stick with it,
> > rather than exposing it as a config?
> >
> > 2. Sorry for losing the format of the table, I've attached it below as a
> > picture
> >
> >
> > Regards,
> > Lucas
> >
> > On Fri, Jun 29, 2018 at 5:28 PM, Ted Yu  wrote:
> >
> >> bq. which is hard coded to be 300 milliseconds
> >>
> >> Have you considered making the duration configurable ?
> >>
> >> The comparison at the end of your email seems to be copied where tabular
> >> form is lost.
> >> Do you mind posting that part again ?
> >>
> >> Thanks
> >>
> >> On Fri, Jun 29, 2018 at 4:53 PM, Lucas Wang 
> >> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thanks for your comments.
> >> > 1. I just replied in the discussion thread about the positive change
> >> this
> >> > KIP can still bring
> >> > if implemented on the latest trunk, which includes the async ZK
> >> operations
> >> > for KAFKA-5642.
> >> > The evaluation is done using an integration test.
> >> > In production, we have not upgraded to Kafka 1.1 yet, and the code we
> >> are
> >> > currently running does
> >> > not include async ZK operations, therefore I don't have any real usage
> >> > result.
> >> >
> >> > 2. Thanks for bringing this up. I haven't considered this setting, and
> >> the
> >> > existing proposal in this KIP
> >> > would make data requests and controller requests share a memory poll
> of
> >> > size specified by the config
> >> > queued.max.request.bytes. The downside is that if there is memory
> >> pressure,
> >> > controller requests may be blocked
> >> > from being read from a socket and does not get prioritized at the
> socket
> >> > layer.
> >> >
> >> > If we have a separate bytes limit for the controller requests, I
> imagine
> >> > there would be a separate memory pool
> >> > dedicated to controller requests. Also it requires the processors to
> >> tell
> >> > connections from a controller apart
> >> > from connections from other brokers or clients, which would probably
> >> > require a dedicated port for the controller?
> >> > IMO, this change is mainly driven by the memory pressure, kind of an
> >> > orthogonal issue, and we can address it with a separate KIP
> >> > if desired. Please let me know what you think.
> >> >
> >> > 3. I plans to change the implementation of the method
> >> > receiveRequest(timeout: Long) in the RequestChannel class as follows:
> >> >
> >> > val controllerRequest = controllerRequestQueue.poll()
> >> > if (controllerRequest != null) {
> >> >   controllerRequest
> >> > } else {
> >> >   dataRequestQueue.poll(timeout, TimeUnit.MILLISECONDS)
> >> > }
> >> >
> >> > with this implementation, there is no need to explicitly choose a
> >> request
> >> > handler thread to wake up depending on
> >> > the types of request enqueued, and if a controller request arrives
> while
> >> > some request handler threads are blocked on an empty data request
> queue,
> >> > they will simply timeout and call the receiveReque

[jira] [Created] (KAFKA-7131) Update release script to generate announcement email text

2018-07-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-7131:
--

 Summary: Update release script to generate announcement email text
 Key: KAFKA-7131
 URL: https://issues.apache.org/jira/browse/KAFKA-7131
 Project: Kafka
  Issue Type: Improvement
Reporter: Matthias J. Sax


When a release is finalized, we send out an email to announce the release. Atm, 
we have a template in the wiki 
([https://cwiki.apache.org/confluence/display/KAFKA/Release+Process|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process)]).
 However, the template needs some manual changes to fill in the release number, 
number of contributors, etc.

Some parts could be automated – the corresponding commands are document in the 
wiki already.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-03 Thread Lucas Wang
Thanks for the insightful comment, Jun.

@Dong,
Since both of the two comments in your previous email are about the
benefits of this KIP and whether it's useful,
in light of Jun's last comment, do you agree that this KIP can be
beneficial in the case mentioned by Jun?
Please let me know, thanks!

Regards,
Lucas

On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao  wrote:

> Hi, Lucas, Dong,
>
> If all disks on a broker are slow, one probably should just kill the
> broker. In that case, this KIP may not help. If only one of the disks on a
> broker is slow, one may want to fail that disk and move the leaders on that
> disk to other brokers. In that case, being able to process the LeaderAndIsr
> requests faster will potentially help the producers recover quicker.
>
> Thanks,
>
> Jun
>
> On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin  wrote:
>
> > Hey Lucas,
> >
> > Thanks for the reply. Some follow up questions below.
> >
> > Regarding 1, if each ProduceRequest covers 20 partitions that are
> randomly
> > distributed across all partitions, then each ProduceRequest will likely
> > cover some partitions for which the broker is still leader after it
> quickly
> > processes the
> > LeaderAndIsrRequest. Then broker will still be slow in processing these
> > ProduceRequest and request will still be very high with this KIP. It
> seems
> > that most ProduceRequest will still timeout after 30 seconds. Is this
> > understanding correct?
> >
> > Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> > then it is less clear how this KIP reduces average produce latency. Can
> you
> > clarify what metrics can be improved by this KIP?
> >
> > Not sure why system operator directly cares number of truncated messages.
> > Do you mean this KIP can improve average throughput or reduce message
> > duplication? It will be good to understand this.
> >
> > Thanks,
> > Dong
> >
> >
> >
> >
> >
> > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang  wrote:
> >
> > > Hi Dong,
> > >
> > > Thanks for your valuable comments. Please see my reply below.
> > >
> > > 1. The Google doc showed only 1 partition. Now let's consider a more
> > common
> > > scenario
> > > where broker0 is the leader of many partitions. And let's say for some
> > > reason its IO becomes slow.
> > > The number of leader partitions on broker0 is so large, say 10K, that
> the
> > > cluster is skewed,
> > > and the operator would like to shift the leadership for a lot of
> > > partitions, say 9K, to other brokers,
> > > either manually or through some service like cruise control.
> > > With this KIP, not only will the leadership transitions finish more
> > > quickly, helping the cluster itself becoming more balanced,
> > > but all existing producers corresponding to the 9K partitions will get
> > the
> > > errors relatively quickly
> > > rather than relying on their timeout, thanks to the batched async ZK
> > > operations.
> > > To me it's a useful feature to have during such troublesome times.
> > >
> > >
> > > 2. The experiments in the Google Doc have shown that with this KIP many
> > > producers
> > > receive an explicit error NotLeaderForPartition, based on which they
> > retry
> > > immediately.
> > > Therefore the latency (~14 seconds+quick retry) for their single
> message
> > is
> > > much smaller
> > > compared with the case of timing out without the KIP (30 seconds for
> > timing
> > > out + quick retry).
> > > One might argue that reducing the timing out on the producer side can
> > > achieve the same result,
> > > yet reducing the timeout has its own drawbacks[1].
> > >
> > > Also *IF* there were a metric to show the number of truncated messages
> on
> > > brokers,
> > > with the experiments done in the Google Doc, it should be easy to see
> > that
> > > a lot fewer messages need
> > > to be truncated on broker0 since the up-to-date metadata avoids
> appending
> > > of messages
> > > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > > whether
> > > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> > >
> > > 3. To answer your question, I think it might be helpful to construct
> some
> > > formulas.
> > > To simplify the modeling, I'm going back to the case where there is
> only
> > > ONE partition involved.
> > > Following the experiments in the Google Doc, let's say broker0 becomes
> > the
> > > follower at time t0,
> > > and after t0 there were still N produce requests in its request queue.
> > > With the up-to-date metadata brought by this KIP, broker0 can reply
> with
> > an
> > > NotLeaderForPartition exception,
> > > let's use M1 to denote the average processing time of replying with
> such
> > an
> > > error message.
> > > Without this KIP, the broker will need to append messages to segments,
> > > which may trigger a flush to disk,
> > > let's use M2 to denote the average processing time for such logic.
> > > Then the average extra latency incurred without this KIP is N * (M2 -
> > M1) /
> > > 2.
> > >
> > > In p

Build failed in Jenkins: kafka-0.10.2-jdk7 #224

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Fix race condition in TestVerifiableProducer sanity test

[jason] MINOR: Fix kafkatest snapshot version for 0.10.2.3

[github] MINOR: update for 0.10.2.2 release (#5327)

--
[...truncated 1.33 MB...]

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldFlushStoreWhenFlushIntervalHasLapsed STARTED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldFlushStoreWhenFlushIntervalHasLapsed PASSED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldUpdateStateWithReceivedRecordsForAllTopicPartition STARTED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldUpdateStateWithReceivedRecordsForAllTopicPartition PASSED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldAssignPartitionsToConsumer STARTED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldAssignPartitionsToConsumer PASSED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldCloseStateMaintainer STARTED

org.apache.kafka.streams.processor.internals.StateConsumerTest > 
shouldCloseStateMaintainer PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotNullPointerWhenStandbyTasksAssignedAndNoStateStoresForTopology STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotNullPointerWhenStandbyTasksAssignedAndNoStateStoresForTopology PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testInjectClients STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testInjectClients PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldCloseActiveTasksThatAreAssignedToThisStreamThreadButAssignmentHasChangedBeforeCreatingNewTasks
 STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldCloseActiveTasksThatAreAssignedToThisStreamThreadButAssignmentHasChangedBeforeCreatingNewTasks
 PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenExceptionOccursDuringFlushStateWhileSuspendingState
 STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenExceptionOccursDuringFlushStateWhileSuspendingState
 PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldInitializeRestoreConsumerWithOffsetsFromStandbyTasks STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldInitializeRestoreConsumerWithOffsetsFromStandbyTasks PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldReleaseStateDirLockIfFailureOnStandbyTaskCloseForUnassignedSuspendedStandbyTask
 STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldReleaseStateDirLockIfFailureOnStandbyTaskCloseForUnassignedSuspendedStandbyTask
 PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testHandingOverTaskFromOneToAnotherThread STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testHandingOverTaskFromOneToAnotherThread PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeCommit 
STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeCommit 
PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testStateChangeStartClose STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testStateChangeStartClose PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenAnExceptionOccursOnTaskCloseDuringShutdown 
STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenAnExceptionOccursOnTaskCloseDuringShutdown PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenExceptionOccursDuringCloseTopologyWhenSuspendingState
 STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldNotViolateAtLeastOnceWhenExceptionOccursDuringCloseTopologyWhenSuspendingState
 PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testPartitionAssignmentChange STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
testPartitionAssignmentChange PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeClean 
STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMaybeClean 
PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldReleaseStateDirLockIfFailureOnTaskCloseForUnassignedSuspendedTask STARTED

org.apache.kafka.streams.processor.internals.StreamThreadTest > 
shouldReleaseStateDirLockIfFailureOnTaskCloseForUnassignedSuspendedTask PASSED

org.apache.kafka.streams.processor.internals.StreamThreadTest > testMetrics 
STARTED

org.apache.kafka

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-07-03 Thread Ted Yu
For #1, I agree that obtaining good default is not trivial. We can revisit
in the future.

For #2, the table is readable.

Thanks

On Tue, Jul 3, 2018 at 4:23 PM, Lucas Wang  wrote:

> @Ted
> For #1, it's probably hard to predict M since it also depends on the
> hardware.
> I'm not sure how to use the suggested formula for the default value if we
> don't know M.
> Also TO is the default timeout we want to figure out, and the formula seems
> to be recursive.
> I'd suggest we stay with the current default value of 300 milliseconds, and
> address it separately
> if it turns out to be a problem. What do you think?
>
> #2, please try this link and see if it works now:
> https://drive.google.com/file/d/1QbPDqfT59A2X4To2p3OfD5YeJR8aW
> DK7/view?usp=sharing
>
> Regards,
> Lucas
>
>
> On Mon, Jul 2, 2018 at 5:52 PM, Ted Yu  wrote:
>
> > For #1, I don't know what would be good approximation for M.
> > Maybe use max((TO / 2) / N, M / N) as default value for poll timeout ?
> >
> > For #2, I don't see the picture in email :-)
> > Can you use third party website ?
> >
> > Thanks
> >
> > On Mon, Jul 2, 2018 at 5:17 PM, Lucas Wang 
> wrote:
> >
> > > Hi Ted,
> > >
> > > 1. I'm neutral on making the poll timeout parameter configurable.
> > > Mainly because as a config, it could be confusing for operators who try
> > to
> > > choose a value for it.
> > >
> > > To understand the implication of this value better,
> > > let's use TO to represent the timeout value under discussion,
> > > M to denote the processing time of data requests,
> > > and N to be the number of io threads.
> > >
> > > - If the data request queue is empty and there is no incoming data
> > > requests,
> > >   all io threads should be blocked on the data request queue, and
> > >   the average delay for a controller request is (TO / 2) / N, and the
> > > worst case delay is TO.
> > > - If all IO threads are busy processing data requests, then the average
> > > latency for a controller request is M / N.
> > > - In the worst case, a controller request can just miss the train, and
> IO
> > > threads get blocked on data request queue
> > >   for TO, at the end of which they all receive a new incoming data
> > > request, the latency for the
> > >   controller request can be TO + M.
> > >
> > > Given the intricacies, what do you think about choosing a relatively
> > > meaningful value and stick with it,
> > > rather than exposing it as a config?
> > >
> > > 2. Sorry for losing the format of the table, I've attached it below as
> a
> > > picture
> > >
> > >
> > > Regards,
> > > Lucas
> > >
> > > On Fri, Jun 29, 2018 at 5:28 PM, Ted Yu  wrote:
> > >
> > >> bq. which is hard coded to be 300 milliseconds
> > >>
> > >> Have you considered making the duration configurable ?
> > >>
> > >> The comparison at the end of your email seems to be copied where
> tabular
> > >> form is lost.
> > >> Do you mind posting that part again ?
> > >>
> > >> Thanks
> > >>
> > >> On Fri, Jun 29, 2018 at 4:53 PM, Lucas Wang 
> > >> wrote:
> > >>
> > >> > Hi Jun,
> > >> >
> > >> > Thanks for your comments.
> > >> > 1. I just replied in the discussion thread about the positive change
> > >> this
> > >> > KIP can still bring
> > >> > if implemented on the latest trunk, which includes the async ZK
> > >> operations
> > >> > for KAFKA-5642.
> > >> > The evaluation is done using an integration test.
> > >> > In production, we have not upgraded to Kafka 1.1 yet, and the code
> we
> > >> are
> > >> > currently running does
> > >> > not include async ZK operations, therefore I don't have any real
> usage
> > >> > result.
> > >> >
> > >> > 2. Thanks for bringing this up. I haven't considered this setting,
> and
> > >> the
> > >> > existing proposal in this KIP
> > >> > would make data requests and controller requests share a memory poll
> > of
> > >> > size specified by the config
> > >> > queued.max.request.bytes. The downside is that if there is memory
> > >> pressure,
> > >> > controller requests may be blocked
> > >> > from being read from a socket and does not get prioritized at the
> > socket
> > >> > layer.
> > >> >
> > >> > If we have a separate bytes limit for the controller requests, I
> > imagine
> > >> > there would be a separate memory pool
> > >> > dedicated to controller requests. Also it requires the processors to
> > >> tell
> > >> > connections from a controller apart
> > >> > from connections from other brokers or clients, which would probably
> > >> > require a dedicated port for the controller?
> > >> > IMO, this change is mainly driven by the memory pressure, kind of an
> > >> > orthogonal issue, and we can address it with a separate KIP
> > >> > if desired. Please let me know what you think.
> > >> >
> > >> > 3. I plans to change the implementation of the method
> > >> > receiveRequest(timeout: Long) in the RequestChannel class as
> follows:
> > >> >
> > >> > val controllerRequest = controllerRequestQueue.poll()
> > >> > if (controllerRequest != null) {
> > >> >   controllerReq

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-03 Thread Dong Lin
Hey Jun,

Thanks much for the comments. It is good point. So the feature may be
useful for JBOD use-case. I have one question below.

Hey Lucas,

Do you think this feature is also useful for non-JBOD setup or it is only
useful for the JBOD setup? It may be useful to understand this.

When the broker is setup using JBOD, in order to move leaders on the failed
disk to other disks, the system operator first needs to get the list of
partitions on the failed disk. This is currently achieved using
AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
broker. If we only prioritize the controller requests, then the
DescribeLogDirsRequest
may still take a long time to be processed by the broker. So the overall
time to move leaders away from the failed disk may still be long even with
this KIP. What do you think?

Thanks,
Dong


On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang  wrote:

> Thanks for the insightful comment, Jun.
>
> @Dong,
> Since both of the two comments in your previous email are about the
> benefits of this KIP and whether it's useful,
> in light of Jun's last comment, do you agree that this KIP can be
> beneficial in the case mentioned by Jun?
> Please let me know, thanks!
>
> Regards,
> Lucas
>
> On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao  wrote:
>
> > Hi, Lucas, Dong,
> >
> > If all disks on a broker are slow, one probably should just kill the
> > broker. In that case, this KIP may not help. If only one of the disks on
> a
> > broker is slow, one may want to fail that disk and move the leaders on
> that
> > disk to other brokers. In that case, being able to process the
> LeaderAndIsr
> > requests faster will potentially help the producers recover quicker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin  wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks for the reply. Some follow up questions below.
> > >
> > > Regarding 1, if each ProduceRequest covers 20 partitions that are
> > randomly
> > > distributed across all partitions, then each ProduceRequest will likely
> > > cover some partitions for which the broker is still leader after it
> > quickly
> > > processes the
> > > LeaderAndIsrRequest. Then broker will still be slow in processing these
> > > ProduceRequest and request will still be very high with this KIP. It
> > seems
> > > that most ProduceRequest will still timeout after 30 seconds. Is this
> > > understanding correct?
> > >
> > > Regarding 2, if most ProduceRequest will still timeout after 30
> seconds,
> > > then it is less clear how this KIP reduces average produce latency. Can
> > you
> > > clarify what metrics can be improved by this KIP?
> > >
> > > Not sure why system operator directly cares number of truncated
> messages.
> > > Do you mean this KIP can improve average throughput or reduce message
> > > duplication? It will be good to understand this.
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang 
> wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > Thanks for your valuable comments. Please see my reply below.
> > > >
> > > > 1. The Google doc showed only 1 partition. Now let's consider a more
> > > common
> > > > scenario
> > > > where broker0 is the leader of many partitions. And let's say for
> some
> > > > reason its IO becomes slow.
> > > > The number of leader partitions on broker0 is so large, say 10K, that
> > the
> > > > cluster is skewed,
> > > > and the operator would like to shift the leadership for a lot of
> > > > partitions, say 9K, to other brokers,
> > > > either manually or through some service like cruise control.
> > > > With this KIP, not only will the leadership transitions finish more
> > > > quickly, helping the cluster itself becoming more balanced,
> > > > but all existing producers corresponding to the 9K partitions will
> get
> > > the
> > > > errors relatively quickly
> > > > rather than relying on their timeout, thanks to the batched async ZK
> > > > operations.
> > > > To me it's a useful feature to have during such troublesome times.
> > > >
> > > >
> > > > 2. The experiments in the Google Doc have shown that with this KIP
> many
> > > > producers
> > > > receive an explicit error NotLeaderForPartition, based on which they
> > > retry
> > > > immediately.
> > > > Therefore the latency (~14 seconds+quick retry) for their single
> > message
> > > is
> > > > much smaller
> > > > compared with the case of timing out without the KIP (30 seconds for
> > > timing
> > > > out + quick retry).
> > > > One might argue that reducing the timing out on the producer side can
> > > > achieve the same result,
> > > > yet reducing the timeout has its own drawbacks[1].
> > > >
> > > > Also *IF* there were a metric to show the number of truncated
> messages
> > on
> > > > brokers,
> > > > with the experiments done in the Google Doc, it should be easy to see
> > > that
> > > > a lot fewer messages need
> > > > to be truncated on broker0 since the up-to-date metadata avoids
>

Build failed in Jenkins: kafka-0.11.0-jdk7 #390

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: updatd for 0.11.0.3 release (#5326)

--
[...truncated 1.56 MB...]

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountWithInternalStore STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountWithInternalStore PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggCoalesced STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggCoalesced PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testRemoveOldBeforeAddNew PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountCoalesced STARTED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testCountCoalesced PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
STARTED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testTypeVariance 
STARTED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testTypeVariance 
PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
retentionTimeShouldBeGapIfGapIsLargerThanDefaultRetentionTime STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
retentionTimeShouldBeGapIfGapIsLargerThanDefaultRetentionTime PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > shouldSetWindowGap STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > shouldSetWindowGap PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
retentionTimeMustNotBeNegative STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
retentionTimeMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > windowSizeMustNotBeZero 
STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > windowSizeMustNotBeZero 
PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
windowSizeMustNotBeNegative STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
windowSizeMustNotBeNegative PASSED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
shouldSetWindowRetentionTime STARTED

org.apache.kafka.streams.kstream.SessionWindowsTest > 
shouldSetWindowRetentionTime PASSED

org.apache.kafka.streams.kstream.WindowTest > 
shouldThrowIfEndIsSmallerThanStart STARTED

org.apache.kafka.streams.kstream.WindowTest > 
shouldThrowIfEndIsSmallerThanStart PASSED

org.apache.kafka.streams.kstream.WindowTest > 
shouldNotBeEqualIfDifferentWindowType STARTED

org.apache.kafka.streams.kstream.WindowTest > 
shouldNotBeEqualIfDifferentWindowType PASSED

org.apache.kafka.streams.kstream.WindowTest > shouldBeEqualIfStartAndEndSame 
STARTED

org.apache.kafka.streams.kstream.WindowTest > shouldBeEqualIfStartAndEndSame 
PASSED

org.apache.kafka.streams.kstream.WindowTest > shouldNotBeEqualIfNull STARTED

org.apache.kafka.streams.kstream.WindowTest > shouldNotBeEqualIfNull PASSED

org.apache.kafka.streams.kstream.WindowTest > shouldThrowIfStartIsNegative 
STARTED

org.apache.kafka.streams.kstream.WindowTest > shouldThrowIfStartIsNegative 
PASSED

org.apache.kafka.streams.kstream.WindowTest > 
shouldNotBeEqualIfStartOrEndIsDifferent STARTED

org.apache.kafka.streams.kstream.WindowTest > 
shouldNotBeEqualIfStartOrEndIsDifferent PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddRegexTopicToLatestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddRegexTopicToLatestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldMapStateStoresToCorrectSourceTopics STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldMapStateStoresToCorrectSourceTopics PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddTableToEarliestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddTableToEarliestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddTimestampExtractorToTablePerSource STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddTimestampExtractorToTablePerSource PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldBuildGlobalTopologyWithAllGlobalTables STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldBuildGlobalTopologyWithAllGlobalTables PASSED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddRegexTopicToEarliestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.KStreamBuilderTest > 
shouldAddRegexTopi

[VOTE] KIP-321: Add method to get TopicNameExtractor in TopologyDescription

2018-07-03 Thread Nishanth Pradeep
Hello,

I would like to start the vote on extending the TopologyDescription.Sink
interface to return the class of the TopicNameExtractor in cases where
dynamic routing is used.

The user can override the toString method of the TopicNameExtractor class
in order to provide a better textual description if he or she chooses.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Add+method+to+get+TopicNameExtractor+in+TopologyDescription

Best,
Nishanth Pradeep


Re: [VOTE] 2.0.0 RC1

2018-07-03 Thread Brett Rann
+1 tentative
rolling upgrade of tiny shared staging multitenacy (200+ consumer groups)
cluster from 1.1 to 2.0.0-rc1. cluster looks healthy. Will monitor.

On Tue, Jul 3, 2018 at 8:18 AM Harsha  wrote:

> +1.
>
> 1) Ran unit tests
> 2) 3 node cluster , tested basic operations.
>
> Thanks,
> Harsha
>
> On Mon, Jul 2nd, 2018 at 11:13 AM, "Vahid S Hashemian" <
> vahidhashem...@us.ibm.com> wrote:
>
> >
> >
> >
> > +1 (non-binding)
> >
> > Built from source and ran quickstart successfully on Ubuntu (with Java
> 8).
> >
> >
> > Minor: It seems this doc update PR is not included in the RC:
> > https://github.com/apache/kafka/pull/5280
> 
> > Guozhang seems to have wanted to cherry-pick it to 2.0.
> >
> > Thanks Rajini!
> > --Vahid
> >
> >
> >
> >
> > From: Rajini Sivaram < rajinisiva...@gmail.com >
> > To: dev < dev@kafka.apache.org >, Users < us...@kafka.apache.org >,
> > kafka-clients < kafka-clie...@googlegroups.com >
> > Date: 06/29/2018 11:36 AM
> > Subject: [VOTE] 2.0.0 RC1
> >
> >
> >
> > Hello Kafka users, developers and client-developers,
> >
> >
> > This is the second candidate for release of Apache Kafka 2.0.0.
> >
> >
> > This is a major version release of Apache Kafka. It includes 40 new KIPs
> > and
> >
> > several critical bug fixes. Please see the 2.0.0 release plan for more
> > details:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820
> 
> >
> >
> >
> > A few notable highlights:
> >
> > - Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for CreateTopics
> > (KIP-277)
> > - SASL/OAUTHBEARER implementation (KIP-255)
> > - Improved quota communication and customization of quotas (KIP-219,
> > KIP-257)
> > - Efficient memory usage for down conversion (KIP-283)
> > - Fix log divergence between leader and follower during fast leader
> > failover (KIP-279)
> > - Drop support for Java 7 and remove deprecated code including old
> > scala
> > clients
> > - Connect REST extension plugin, support for externalizing secrets and
> > improved error handling (KIP-285, KIP-297, KIP-298 etc.)
> > - Scala API for Kafka Streams and other Streams API improvements
> > (KIP-270, KIP-150, KIP-245, KIP-251 etc.)
> >
> > Release notes for the 2.0.0 release:
> >
> > http://home.apache.org/~rsivaram/kafka-2.0.0-rc1/RELEASE_NOTES.html
> 
> >
> >
> >
> >
> > *** Please download, test and vote by Tuesday, July 3rd, 4pm PT
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > http://kafka.apache.org/KEYS
> 
> >
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > http://home.apache.org/~rsivaram/kafka-2.0.0-rc1/
> 
> >
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/
> 
> >
> >
> >
> > * Javadoc:
> >
> > http://home.apache.org/~rsivaram/kafka-2.0.0-rc1/javadoc/
> 
> >
> >
> >
> > * Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:
> >
> > https://github.com/apache/kafka/tree/2.0.0-rc1
> 
> >
> >
> >
> > * Documentation:
> >
> > http://kafka.apache.org/20/documentation.html
> 
> >
> >
> >
> > * Protocol:
> >
> > http://kafka.apache.org/20/protocol.html
> 
> >
> >
> >
> > * Successful Jenkins builds for the 2.0 branch:
> >
> > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.0-jdk8/66/
> 
> >
> >
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.0/15/
> 
> >
> >
> >
> >
> > Please test and verify the release artifacts and submit a vote for this
> RC
> >
> > or report any issues so that we can fix them and roll out a new RC ASAP!
> >
> > Although this release vote requires PMC votes to pass, testing, votes,
> and
> >
> > bug
> > reports are valuable and appreciated from everyone.
> >
> >
> > Thanks,
> >
> >
> > Rajini
> >
> >
> >
> >
> >
> >
> >
> >
>


-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia


Re: [VOTE] KIP-321: Add method to get TopicNameExtractor in TopologyDescription

2018-07-03 Thread Ted Yu
Hi,
I don't seem to find response to John's comment :

http://search-hadoop.com/m/Kafka/uyzND11alrn1G5N3Y1?subj=Re+Discuss+KIP+321+Add+method+to+get+TopicNameExtractor+in+TopologyDescription

On Tue, Jul 3, 2018 at 7:38 PM, Nishanth Pradeep 
wrote:

> Hello,
>
> I would like to start the vote on extending the TopologyDescription.Sink
> interface to return the class of the TopicNameExtractor in cases where
> dynamic routing is used.
>
> The user can override the toString method of the TopicNameExtractor class
> in order to provide a better textual description if he or she chooses.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 321%3A+Add+method+to+get+TopicNameExtractor+in+TopologyDescription
>
> Best,
> Nishanth Pradeep
>


Build failed in Jenkins: kafka-0.11.0-jdk7 #391

2018-07-03 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: fix Streams quickstart (#5331)

--
[...truncated 213.81 KB...]
kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired STARTED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents STARTED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testBatchSize STARTED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents STARTED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testProducerQueueSize STARTED

kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner STARTED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker STARTED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed STARTED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest > testJavaProducer STARTED

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder STARTED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.api.LogAppendTimeTest > testProduceConsume STARTED

kafka.api.LogAppendTimeTest > testProduceConsume PASSED

kafka.api.FetchRequestTest > testShuffleWithSingleTopic STARTED

kafka.api.FetchRequestTest > testShuffleWithSingleTopic PASSED

kafka.api.FetchRequestTest > testShuffle STARTED

kafka.api.FetchRequestTest > testShuffle PASSED

kafka.api.PlaintextProducerSendTest > 
testSendCompressedMessageWithLogAppendTime STARTED

kafka.api.PlaintextProducerSendTest > 
testSendCompressedMessageWithLogAppendTime PASSED

kafka.api.PlaintextProducerSendTest > testAutoCreateTopic STARTED

kafka.api.PlaintextProducerSendTest > testAutoCreateTopic PASSED

kafka.api.PlaintextProducerSendTest > testSendWithInvalidCreateTime STARTED

kafka.api.PlaintextProducerSendTest > testSendWithInvalidCreateTime PASSED

kafka.api.PlaintextProducerSendTest > testBatchSizeZero STARTED

kafka.api.PlaintextProducerSendTest > testBatchSizeZero PASSED

kafka.api.PlaintextProducerSendTest > testWrongSerializer STARTED

kafka.api.PlaintextProducerSendTest > testWrongSerializer PASSED

kafka.api.PlaintextProducerSendTest > 
testSendNonCompressedMessageWithLogAppendTime STARTED

kafka.api.PlaintextProducerSendTest > 
testSendNonCompressedMessageWithLogAppendTime PASSED

kafka.api.PlaintextProducerSendTest > 
testSendNonCompressedMessageWithCreateTime STARTED

kafka.api.PlaintextProducerSendTest > 
testSendNonCompressedMessageWithCreateTime PASSED

kafka.api.PlaintextProducerSendTest > testClose STARTED

kafka.api.PlaintextProducerSendTest > testClose PASSED

kafka.api.PlaintextProducerSendTest > testFlush STARTED

kafka.api.PlaintextProducerSendTest > testFlush PASSED

kafka.api.PlaintextProducerSendTest > testSendToPartition STARTED

kafka.api.PlaintextProducerSendTest > testSendToPartition PASSED

kafka.api.PlaintextProducerSendTest > testSendOffset STARTED

kafka.api.PlaintextProducerSendTest > testSendOffset PASSED

kafka.api.PlaintextProducerSendTest > testSendCompressedMessageWithCreateTime 
STARTED

kafka.api.PlaintextProducerSendTest > testSendCompressedMessageWithCreateTime 
PASSED

kafka.api.PlaintextProducerSendTest > testCloseWithZeroTimeoutFromCallerThread 
STARTED

kafka.api.PlaintextProducerSendTest > testCloseWithZeroTimeoutFromCallerThread 
PASSED

kafka.api.PlaintextProducerSendTest > testCloseWithZeroTimeoutFromSenderThread 
STARTED

kafka.api.PlaintextProducerSendTest > testCloseWithZeroTimeoutFromSenderThread 
PASSED

kafka.api.PlaintextProducerSendTest > testSendBeforeAndAfterPartitionExpansion 
STARTED

kafka.api.PlaintextProducerSendTest > testSendBeforeAndAfterPartitionExpansion 
PASSED

kafka.api.PlaintextConsumerTest > testEarliestOrLatestOffsets STARTED

kafka.api.PlaintextConsumerTest > testEarliestOrLatestOffsets PASSED

kafka.api.PlaintextConsumerTest > testPartitionsForAutoCreate STARTED

kafka.api.PlaintextConsumerTest > testPartitionsForAutoCreate PASSED

kafka.api.PlaintextConsumerTest > testShrinkingTopicSubscriptions STARTED

kafka.api.PlaintextConsumerTest > testShrinkingTopicSubscriptions PASSED

kafka.api.PlaintextConsumerTest > testMaxPollIntervalMs STARTED

kafka.api.PlaintextConsumerTest > testMa

Re: [Discuss] KIP-321: Add method to get TopicNameExtractor in TopologyDescription

2018-07-03 Thread Matthias J. Sax
John,

I am a little bit on the fence. In retrospective, it might have been
better to add `topic()` and `topicPattern()` to source node and return a
proper `Pattern` object instead of the pattern as a String.

All other "payload" is just names and thus String naturally. From my
point of view `TopologyDescription` should represent the `Topology` in a
"machine readable" form plus a default "human readable" from via
`toString()` -- this does not imply that all return types should be String.

Let me know what you think. If you agree, we could even add
`Source#topicPattern()` in another KIP.


-Matthias

On 6/26/18 3:45 PM, John Roesler wrote:
> Sorry for the late comment,
> 
> Looking at the other pieces of TopologyDescription, I noticed that pretty
> much all of the "payload" of these description nodes are strings. Should we
> consider returning a string from `topicNameExtractor()` instead?
> 
> In fact, if we did that, we could consider calling `toString()` on the
> extractor instead of returning the class name. This would allow authors of
> the extractors to provide more information about the extractor than just
> its name. This might be especially useful in the case of anonymous
> implementations.
> 
> Thanks for the KIP,
> -John
> 
> On Mon, Jun 25, 2018 at 11:52 PM Ted Yu  wrote:
> 
>> My previous response was talking about the new method in
>> InternalTopologyBuilder.
>> The exception just means there is no uniform extractor for all the sinks.
>>
>> On Mon, Jun 25, 2018 at 8:02 PM, Matthias J. Sax 
>> wrote:
>>
>>> Ted,
>>>
>>> Why? Each sink can have a different TopicNameExtractor.
>>>
>>>
>>> -Matthias
>>>
>>> On 6/25/18 5:19 PM, Ted Yu wrote:
 If there are different TopicNameExtractor classes from multiple sink
>>> nodes,
 the new method should throw exception alerting user of such scenario.


 On Mon, Jun 25, 2018 at 2:23 PM, Bill Bejeck 
>> wrote:

> Thanks for the KIP!
>
> Overall I'm +1 on the KIP.   I have one question.
>
> The KIP states that the method "topicNameExtractor()" is added to the
> InternalTopologyBuilder.java.
>
> It could be that I'm missing something, but wow does this work if a
>> user
> has provided different TopicNameExtractor instances to multiple sink
>>> nodes?
>
> Thanks,
> Bill
>
>
>
> On Mon, Jun 25, 2018 at 1:25 PM Guozhang Wang 
>>> wrote:
>
>> Yup I agree, generally speaking the `toString()` output is not
> recommended
>> to be relied on programmatically in user's code, but we've observed
>> convenience-beats-any-other-reasons again and again in development
>> unfortunately. I think we should still not claiming it is part of the
>> public APIs that would not be changed anyhow in the future, but just
>> mentioning it in the wiki for people to be aware is fine.
>>
>>
>> Guozhang
>>
>> On Sun, Jun 24, 2018 at 5:01 PM, Matthias J. Sax <
>>> matth...@confluent.io>
>> wrote:
>>
>>> Thanks for the KIP!
>>>
>>> I am don't have any further comments.
>>>
>>> For Guozhang's comment: if we mention anything about `toString()`,
>> we
>>> should make explicit that `toString()` output is still not public
>>> contract and users should not rely on the output.
>>>
>>> Furhtermore, for the actual uses output, I would replace "topic:" by
>>> "extractor class:" to make the difference obvious.
>>>
>>> I am just hoping that people actually to not rely on `toString()`
>> what
>>> defeats the purpose to the `TopologyDescription` class that was
>>> introduced to avoid the dependency... (Just a side comment, not
>> really
>>> related to this KIP proposal itself).
>>>
>>>
>>> If there are no further comments in the next days, feel free to
>> start
>>> the VOTE and open a PR.
>>>
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 6/22/18 6:04 PM, Guozhang Wang wrote:
 Thanks for writing the KIP!

 I'm +1 on the proposed changes over all. One minor suggestion: we
>> should
 also mention that the `Sink#toString` will also be updated, in a
>> way
>> that
 if `topic()` returns null, use the other call, etc. This is because
 although we do not explicitly state the following logic as public
>>> protocols:

 ```

 "Sink: " + name + " (topic: " + topic() + ")\n  <-- " +
 nodeNames(predecessors);


 ```

 There are already some users that rely on
>> `topology.describe().toString(
>>> )`
 to have runtime checks on the returned string values, so changing
> this
 means that their app will break and hence need to make code
>> changes.

 Guozhang

 On Wed, Jun 20, 2018 at 7:20 PM, Nishanth Pradeep <
>> nishanth...@gmail.com

 wrote:

> Hello 

Re: [Discuss] KIP-321: Add method to get TopicNameExtractor in TopologyDescription

2018-07-03 Thread Matthias J. Sax
I just double checked the discussion thread of KIP-120 that introduced
`TopologyDescription`. Back than the argument was, that using the
simplest option might be sufficient because the description is mostly
used for debugging.

Not sure if this argument holds. It seem that people built first more
sophisticated tools using TopologyDescription.

Final note: if we really want to add `topicPattern()` we might want to
deprecate `topic()` and replace with `Set topics()`, because a
source node can take a multiple topics, too.

Just adding this for completeness of context to the discussion.


-Matthias

On 7/3/18 11:09 PM, Matthias J. Sax wrote:
> John,
> 
> I am a little bit on the fence. In retrospective, it might have been
> better to add `topic()` and `topicPattern()` to source node and return a
> proper `Pattern` object instead of the pattern as a String.
> 
> All other "payload" is just names and thus String naturally. From my
> point of view `TopologyDescription` should represent the `Topology` in a
> "machine readable" form plus a default "human readable" from via
> `toString()` -- this does not imply that all return types should be String.
> 
> Let me know what you think. If you agree, we could even add
> `Source#topicPattern()` in another KIP.
> 
> 
> -Matthias
> 
> On 6/26/18 3:45 PM, John Roesler wrote:
>> Sorry for the late comment,
>>
>> Looking at the other pieces of TopologyDescription, I noticed that pretty
>> much all of the "payload" of these description nodes are strings. Should we
>> consider returning a string from `topicNameExtractor()` instead?
>>
>> In fact, if we did that, we could consider calling `toString()` on the
>> extractor instead of returning the class name. This would allow authors of
>> the extractors to provide more information about the extractor than just
>> its name. This might be especially useful in the case of anonymous
>> implementations.
>>
>> Thanks for the KIP,
>> -John
>>
>> On Mon, Jun 25, 2018 at 11:52 PM Ted Yu  wrote:
>>
>>> My previous response was talking about the new method in
>>> InternalTopologyBuilder.
>>> The exception just means there is no uniform extractor for all the sinks.
>>>
>>> On Mon, Jun 25, 2018 at 8:02 PM, Matthias J. Sax 
>>> wrote:
>>>
 Ted,

 Why? Each sink can have a different TopicNameExtractor.


 -Matthias

 On 6/25/18 5:19 PM, Ted Yu wrote:
> If there are different TopicNameExtractor classes from multiple sink
 nodes,
> the new method should throw exception alerting user of such scenario.
>
>
> On Mon, Jun 25, 2018 at 2:23 PM, Bill Bejeck 
>>> wrote:
>
>> Thanks for the KIP!
>>
>> Overall I'm +1 on the KIP.   I have one question.
>>
>> The KIP states that the method "topicNameExtractor()" is added to the
>> InternalTopologyBuilder.java.
>>
>> It could be that I'm missing something, but wow does this work if a
>>> user
>> has provided different TopicNameExtractor instances to multiple sink
 nodes?
>>
>> Thanks,
>> Bill
>>
>>
>>
>> On Mon, Jun 25, 2018 at 1:25 PM Guozhang Wang 
 wrote:
>>
>>> Yup I agree, generally speaking the `toString()` output is not
>> recommended
>>> to be relied on programmatically in user's code, but we've observed
>>> convenience-beats-any-other-reasons again and again in development
>>> unfortunately. I think we should still not claiming it is part of the
>>> public APIs that would not be changed anyhow in the future, but just
>>> mentioning it in the wiki for people to be aware is fine.
>>>
>>>
>>> Guozhang
>>>
>>> On Sun, Jun 24, 2018 at 5:01 PM, Matthias J. Sax <
 matth...@confluent.io>
>>> wrote:
>>>
 Thanks for the KIP!

 I am don't have any further comments.

 For Guozhang's comment: if we mention anything about `toString()`,
>>> we
 should make explicit that `toString()` output is still not public
 contract and users should not rely on the output.

 Furhtermore, for the actual uses output, I would replace "topic:" by
 "extractor class:" to make the difference obvious.

 I am just hoping that people actually to not rely on `toString()`
>>> what
 defeats the purpose to the `TopologyDescription` class that was
 introduced to avoid the dependency... (Just a side comment, not
>>> really
 related to this KIP proposal itself).


 If there are no further comments in the next days, feel free to
>>> start
 the VOTE and open a PR.




 -Matthias

 On 6/22/18 6:04 PM, Guozhang Wang wrote:
> Thanks for writing the KIP!
>
> I'm +1 on the proposed changes over all. One minor suggestion: we
>>> should
> also mention that the `Sink#toString` will also be updated, in a
>>> way
>>> that