from:"Colin"

Re: Kafka Command Line & Shell

2014-10-17 Thread Colin

+100

--
Colin Clark 
+1-320-221-9531
 

> On Oct 17, 2014, at 7:10 PM, Steve Morin  wrote:
> 
> Joe I think this is great!
> 
>> On Fri, Oct 17, 2014 at 5:03 PM, Joe Stein  wrote:
>> 
>> Hi, I have been thinking about the ease of use for operations with Kafka.
>> We have lots of tools doing a lot of different things and they are all kind
>> of in different places.
>> 
>> So, what I was thinking is to have a single interface for our tooling
>> https://issues.apache.org/jira/browse/KAFKA-1694
>> 
>> This would manifest itself in two ways 1) a command line interface 2) a
>> repl
>> 
>> We would have one entry point centrally for all Kafka commands.
>> kafka  
>> kafka createTopic --brokerList etc,
>> kafka reassignPartition --brokerList etc,
>> 
>> or execute and run the shell
>> 
>> kafka --brokerList localhost
>> kafka>use topicName;
>> kafka>set acl='label';
>> 
>> I was thinking that all calls would be initialized through --brokerList and
>> the broker can tell the KafkaCommandTool what server to connect to for
>> MetaData.
>> 
>> Thoughts? Tomatoes?
>> 
>> /***
>> Joe Stein
>> Founder, Principal Consultant
>> Big Data Open Source Security LLC
>> http://www.stealth.ly
>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> /
>>

Re: Compile failure going from kafka 0.8.1.1 to 0.8.2

2014-10-30 Thread Colin

+1 on two methods

--
Colin Clark 
+1-320-221-9531
 

> On Oct 30, 2014, at 11:24 AM, Jonathan Weeks  wrote:
> 
> +1 on the two methods suggestion
> 
> -JW
> 
>> On Oct 30, 2014, at 9:20 AM, Jay Kreps  wrote:
>> 
>> But Jun,
>> 
>> That change breaks peoples code who weren't calling in that way. I don't
>> think we should be making breaking changes in a point release like this.
>> 
>> I think we should treat this like a bug for 0.8.2 final, we should be able
>> to add two commitOffsets methods with and without the param which should
>> fix the problem, right?
>> 
>> -Jay
>> 
>>> On Thu, Oct 30, 2014 at 8:59 AM, Jun Rao  wrote:
>>> 
>>> Jack,
>>> 
>>> The commit offset api is changed slightly from
>>> 
>>> def commitOffsets() in 0.8.1.x
>>> 
>>> to
>>> 
>>> def commitOffsets(retryOnFailure: Boolean = true) in 0.8.2.x.
>>> 
>>> If you have been calling the method with parentheses like commitOffsets(),
>>> then the code will compile in both 0.8.1.x and 0.8.2.x. In general, the
>>> scala rule (http://docs.scala-lang.org/style/method-invocation.html) for
>>> omitting parentheses when calling arity-0 methods is that the methods in
>>> question have no side effects. In this case, commitOffsets() clearly has
>>> side effect and should have been called with parentheses.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>> 
>>> 
>>>> On Wed, Oct 29, 2014 at 12:40 PM, Jack Foy  wrote:
>>>> 
>>>> My Scala project built against kafka 0.8.1.1 commits consumer offsets as
>>>> follows:
>>>> 
>>>>   connector.commitOffsets
>>>> 
>>>> This compiles without warnings. When I bumped the library dependency to
>>>> 0.8.2-beta, the compiler started emitting this error:
>>>> 
>>>>   [error]
>>>> src/main/scala/com/whitepages/kafka/consumer/Connector.scala:21: missing
>>>> arguments for method commitOffsets in trait ConsumerConnector;
>>>>   [error] follow this method with `_' if you want to treat it as a
>>>> partially applied function
>>>>   [error] connector.commitOffsets
>>>>   [error]   ^
>>>>   [error] one error found
>>>>   [error] (compile:compile) Compilation failed
>>>> 
>>>> The following change resolved the error:
>>>> 
>>>>   -connector.commitOffsets
>>>>   +connector.commitOffsets()
>>>> 
>>>> Should we expect compilation-breaking changes moving from 0.8.1.1 to
>>>> 0.8.2-beta?
>>>> 
>>>> --
>>>> Jack Foy 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>

Re: [ANNOUNCE] New Committer: Manikumar Reddy

2018-10-12 Thread Colin McCabe

Congratulations, Manikumar!  Well done.

best,
Colin


On Fri, Oct 12, 2018, at 01:25, Edoardo Comar wrote:
> Well done Manikumar !
> --
> 
> Edoardo Comar
> 
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
> 
> 
> 
> 
> From:   "Matthias J. Sax" 
> To: dev 
> Cc: users 
> Date:   11/10/2018 23:41
> Subject:Re: [ANNOUNCE] New Committer: Manikumar Reddy
> 
> 
> 
> Congrats!
> 
> 
> On 10/11/18 2:31 PM, Yishun Guan wrote:
> > Congrats Manikumar!
> > On Thu, Oct 11, 2018 at 1:20 PM Sönke Liebau
> >  wrote:
> >>
> >> Great news, congratulations Manikumar!!
> >>
> >> On Thu, Oct 11, 2018 at 9:08 PM Vahid Hashemian 
> 
> >> wrote:
> >>
> >>> Congrats Manikumar!
> >>>
> >>> On Thu, Oct 11, 2018 at 11:49 AM Ryanne Dolan 
> >>> wrote:
> >>>
> >>>> Bravo!
> >>>>
> >>>> On Thu, Oct 11, 2018 at 1:48 PM Ismael Juma  
> wrote:
> >>>>
> >>>>> Congratulations Manikumar! Thanks for your continued contributions.
> >>>>>
> >>>>> Ismael
> >>>>>
> >>>>> On Thu, Oct 11, 2018 at 10:39 AM Jason Gustafson 
> 
> >>>>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> The PMC for Apache Kafka has invited Manikumar Reddy as a committer
> >>> and
> >>>>> we
> >>>>>> are
> >>>>>> pleased to announce that he has accepted!
> >>>>>>
> >>>>>> Manikumar has contributed 134 commits including significant work to
> >>> add
> >>>>>> support for delegation tokens in Kafka:
> >>>>>>
> >>>>>> KIP-48:
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka
> 
> >>>>>> KIP-249
> >>>>>> <
> >>>>>
> >>>>
> >>> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+KafkaKIP-249
> 
> >>>>>>
> >>>>>> :
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> 
> >>>>>>
> >>>>>> He has broad experience working with many of the core components in
> >>>> Kafka
> >>>>>> and he has reviewed over 80 PRs. He has also made huge progress
> >>>>> addressing
> >>>>>> some of our technical debt.
> >>>>>>
> >>>>>> We appreciate the contributions and we are looking forward to more.
> >>>>>> Congrats Manikumar!
> >>>>>>
> >>>>>> Jason, on behalf of the Apache Kafka PMC
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Sönke Liebau
> >> Partner
> >> Tel. +49 179 7940878
> >> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> 
> [attachment "signature.asc" deleted by Edoardo Comar/UK/IBM] 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [VOTE] KIP-376: Implement AutoClosable on appropriate classes that want to be used in a try-with-resource statement

2018-10-12 Thread Colin McCabe

Hi Yishun,

Thanks for looking at this.

Under "proposed changes," it's not necessary to add a section where you 
demonstrate adding "implements AutoCloseable" to the code.  We know what adding 
that would look like.

Can you create a full, single, list of all the classes that would be affected?  
It's not necessary to write who suggested which classes in the KIP.  Also, I 
noticed some of the classes here are in "internals" packages.  Given that these 
are internal classes that aren't part of our API, it's not necessary to add 
them to the KIP, I think.  Since they are implementation details, they can be 
changed at any time without a KIP.

The "compatibility" section should have a discussion of the fact that we can 
add the new interface without requiring any backwards-incompatible changes at 
the source or binary level.  In particular, it would be good to highlight that 
we are not renaming or changing the existing "close" methods.

Under "rejected alternatives" we could explain why we chose to implement 
AutoCloseable rather than Closeable.

cheers,
Colin

On Thu, Oct 11, 2018, at 13:48, Yishun Guan wrote:
> Hi,
> 
> Just to bump this voting thread up again. Thanks!
> 
> Best,
> Yishun
> On Fri, Oct 5, 2018 at 12:58 PM Yishun Guan  wrote:
> >
> > Hi,
> >
> > I think we have discussed this well enough to put this into a vote.
> >
> > Suggestions are welcome!
> >
> > Best,
> > Yishun
> >
> > On Wed, Oct 3, 2018, 2:30 PM Yishun Guan  wrote:
> >>
> >> Hi All,
> >>
> >> I want to start a voting on this KIP:
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93325308
> >>
> >> Here is the discussion thread:
> >> https://lists.apache.org/thread.html/9f6394c28d3d11a67600d5d7001e8aaa318f1ad497b50645654bbe3f@%3Cdev.kafka.apache.org%3E
> >>
> >> Thanks,
> >> Yishun

Re: [DISCUSS] KIP-377: TopicCommand to use AdminClient

2018-10-12 Thread Colin McCabe

Hi Viktor,

Thanks for bumping this thread.

I think we should just focus on transitioning the TopicCommand to use 
AdminClient, and talk about protocol changes in a separate KIP.  Protocol 
changes often involve a lot of discussion.  This does mean that we couldn't 
implement the "list topics under deletion" feature when using AdminClient at 
the moment.  We could add a note to the tool output indicating this.

We should move the protocol discussion to a separate thread.  Probably also 
look at KIP-142 as well.

best,
Colin


On Tue, Oct 9, 2018, at 07:45, Viktor Somogyi-Vass wrote:
> Hi All,
> 
> Would like to bump this as the conversation sank a little bit, but more
> importantly I'd like to validate my plans/ideas on extending the Metadata
> protocol. I was thinking about two other alternatives, namely:
> 1. Create a ListTopicUnderDeletion protocol. This however would be
> unnecessary: it'd have one very narrow functionality which we can't extend.
> I'd make sense to have a list topics or describe topics protocol where we
> can list/describe topics under deletion but for normal listing/describing
> we already use the metadata, so it would be a duplication of functionality.
> 2. DeleteTopicsResponse could return the topics under deletion if the
> request's argument list is empty which might make sense at the first look,
> but actually we'd mix the query functionality with the delete functionality
> which is counterintuitive.
> 
> Even though most clients won't need these "limbo" topics (which are under
> deletion) in the foreseeable future, it can be considered as part of the
> cluster state or metadata and to me it makes sense. Also it doesn't have a
> big overhead in the response size as typically users don't delete topics
> too often as far as I experienced.
> 
> I'd be happy to receive some ideas/feedback on this.
> 
> Cheers,
> Viktor
> 
> 
> On Fri, Sep 28, 2018 at 4:51 PM Viktor Somogyi-Vass 
> wrote:
> 
> > Hi All,
> >
> > I made an update to the KIP. Just in short:
> > Currently KafkaAdminClient.describeTopics() and
> > KafkaAdminClient.listTopics() uses the Metadata protocol to acquire topic
> > information. The returned response however won't contain the topics that
> > are under deletion but couldn't complete yet (for instance because of some
> > replicas offline), therefore it is not possible to implement the current
> > command's "marked for deletion" feature. To get around this I introduced
> > some changes in the Metadata protocol.
> >
> > Thanks,
> > Viktor
> >
> > On Fri, Sep 28, 2018 at 4:48 PM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com> wrote:
> >
> >> Hi Mickael,
> >>
> >> Thanks for the feedback, I also think that many customers wanted this for
> >> a long time.
> >>
> >> Cheers,
> >> Viktor
> >>
> >> On Fri, Sep 28, 2018 at 11:45 AM Mickael Maison 
> >> wrote:
> >>
> >>> Hi Viktor,
> >>> Thanks for taking this task!
> >>> This is a very nice change as it will allow users to use this tool in
> >>> many Cloud environments where direct zookeeper access is not
> >>> available.
> >>>
> >>>
> >>> On Thu, Sep 27, 2018 at 10:34 AM Viktor Somogyi-Vass
> >>>  wrote:
> >>> >
> >>> > Hi All,
> >>> >
> >>> > This is the continuation of the old KIP-375 with the same title:
> >>> >
> >>> https://lists.apache.org/thread.html/dc71d08de8cd2f082765be22c9f88bc9f8b39bb8e0929a3a4394e9da@%3Cdev.kafka.apache.org%3E
> >>> >
> >>> > The problem there was that two KIPs were created around the same time
> >>> and I
> >>> > chose to reorganize mine a bit and give it a new number to avoid
> >>> > duplication.
> >>> >
> >>> > The KIP summary here once again:
> >>> >
> >>> > I wrote up a relatively simple KIP about improving the Kafka protocol
> >>> and
> >>> > the TopicCommand tool to support the new Java based AdminClient and
> >>> > hopefully to deprecate the Zookeeper side of it.
> >>> >
> >>> > I would be happy to receive some opinions about this. In general I
> >>> think
> >>> > this would be an important addition as this is one of the few left but
> >>> > important tools that still uses direct Zookeeper connection.
> >>> >
> >>> > Here is the link for the KIP:
> >>> >
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-377%3A+TopicCommand+to+use+AdminClient
> >>> >
> >>> > Cheers,
> >>> > Viktor
> >>>
> >>

Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-12 Thread Colin McCabe

On Mon, Oct 8, 2018, at 12:35, Thomas Becker wrote:
> Well my (perhaps flawed) understanding of topic priorities is that lower 
> priority topics are not consumed as long as higher priority ones have 
> unconsumed messages (which means our position < HW). So if I'm doing 
> this manually, I have to make some determination as to whether my high 
> priority topic partitions are at the HW before I can decide if I want to 
> poll the lower priority ones. Right?

Hi Thomas,

You could periodically check the last committed position of various partitions 
using KafkaConsumer#committed.  But this would be very inefficient.  For one 
thing, you'd have to keep waking up your consumer thread all the time to do 
this.

The two-consumer solution that I suggested earlier just implies that you have 
two consumers, one for the control data and one for the non-control data.  In 
that case, as long as control data is available, your consumer will always try 
to read it.  It doesn't involve the caller checking committed position using 
KafkaConsumer#committed at any point.

Usually, consumers are reading data that is relatively recent.  If the consumer 
is too slow to keep up with the incoming messages over the long term, the 
system usually gets into a bad state.  I think this is one reason why it's hard 
to think of use-cases for this feature.  If you had a control partition and 
data partition, the data partition wouldn't really block you from getting the 
control messages in a timely fashion.  You almost certainly need to be able to 
keep up with both partitions anyway.  Also, if you have to do some very 
expensive processing on data messages, you should be offloading that processing 
to another thread, rather than doing the expensive thing in your consumer 
thread.  And you can mute a partition while you're processing an expensive 
message from that partition, so it doesn't really block the processing of other 
partitions anyway.

Maybe there's some really cool use-case that I haven't thought of.  But so far 
I can't really think of any time I would need topic priorities if I was muting 
topics and offloading blocking operations in a reasonable way.  It would be 
good to identify use-cases because it would motivate choices like how many 
priorities do we want (2? 256?  4 billion?) and what the API would be like, etc.

best,
Colin

> 
> On Fri, 2018-10-05 at 11:34 -0700, Colin McCabe wrote:
> 
> On Fri, Oct 5, 2018, at 10:58, Thomas Becker wrote:
> 
> Colin,
> 
> Would you mind sharing your vision for how this looks with multiple
> 
> consumers? I'm still getting my bearings with the new consumer but it's
> 
> not immediately obvious to me how this would work.
> 
> 
> Hi Thomas,
> 
> 
> I was just responding to the general idea that you would have some kind 
> of control topic that you wanted to read with very low latency, and some 
> kind of set of data topics where the latency requirements are less 
> strict.  In that case, you can just have two consumers: one for the low-
> latency topic, and one for the less low-latency topics.
> 
> 
> There's a lot of things in this picture that are unclear.  Does the data 
> in one set of topics have any relation to the data in the other?  Why do 
> we want a control channel distinct from the data channel?  That's why I 
> asked for clarification on the use-case.
> 
> 
> In particular, it doesn't seem particularly easy to know when you are at 
> the high
> 
> watermark of a topic.
> 
> 
> KafkaConsumer#committed will return the last committed offset for a 
> partition.  However, I'm not sure I understand why you want this 
> information in this case-- can you expand a bit on this?
> 
> 
> best,
> 
> Colin
> 
> 
> 
> 
> -Tommy
> 
> 
> On Mon, 2018-10-01 at 13:43 -0700, Colin McCabe wrote:
> 
> 
> Hi all,
> 
> 
> 
> I feel like the DISCUSS thread didn't really come to a conclusion, so a
> 
> vote would be premature here.
> 
> 
> 
> In particular, I still don't really understand the use-case for this
> 
> feature.  Can someone give a concrete scenario where you would need
> 
> this?  The control plane / data plane example that is listed in the KIP
> 
> doesn't require this feature.  You can just have one consumer for the
> 
> control plane, and one for the data plane, and do priority that way.
> 
> The discussion feels kind of unfocused since we haven't identified even
> 
> one concrete use-case that needs this feature.
> 
> 
> 
> Unfortunately, this is a feature which consumes server-side memory.  We
> 
> have to store the priorities somehow when doing incremental fetch
> 
> requests.  If we go with an in

Re: [VOTE] KIP-376: Implement AutoClosable on appropriate classes that want to be used in a try-with-resource statement

2018-10-12 Thread Colin McCabe

On Fri, Oct 12, 2018, at 15:45, Yishun Guan wrote:
> Hi Colin,
> 
> Thanks for your suggestions. I have modified the current KIP with your
> comments. However, I still think I should keep the entire list, because it
> is a good way to keep track of which class need to be change, and others
> can discuss if changes on these internal classes are necessary?

Hi Yishun,

I guess I don't feel that strongly about it.  If you want to keep the internal 
classes in the list, that's fine.  They don't really need to be in the KIP but 
it's OK if they're there.

Thanks for working on this.  +1 (binding).

best,
Colin

> 
> Thanks,
> Yishun
> 
> On Fri, Oct 12, 2018 at 11:42 AM Colin McCabe  wrote:
> 
> > Hi Yishun,
> >
> > Thanks for looking at this.
> >
> > Under "proposed changes," it's not necessary to add a section where you
> > demonstrate adding "implements AutoCloseable" to the code.  We know what
> > adding that would look like.
> >
> > Can you create a full, single, list of all the classes that would be
> > affected?  It's not necessary to write who suggested which classes in the
> > KIP.  Also, I noticed some of the classes here are in "internals"
> > packages.  Given that these are internal classes that aren't part of our
> > API, it's not necessary to add them to the KIP, I think.  Since they are
> > implementation details, they can be changed at any time without a KIP.
> >
> > The "compatibility" section should have a discussion of the fact that we
> > can add the new interface without requiring any backwards-incompatible
> > changes at the source or binary level.  In particular, it would be good to
> > highlight that we are not renaming or changing the existing "close" methods.
> >
> > Under "rejected alternatives" we could explain why we chose to implement
> > AutoCloseable rather than Closeable.
> >
> > cheers,
> > Colin
> >
> >
> > On Thu, Oct 11, 2018, at 13:48, Yishun Guan wrote:
> > > Hi,
> > >
> > > Just to bump this voting thread up again. Thanks!
> > >
> > > Best,
> > > Yishun
> > > On Fri, Oct 5, 2018 at 12:58 PM Yishun Guan  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I think we have discussed this well enough to put this into a vote.
> > > >
> > > > Suggestions are welcome!
> > > >
> > > > Best,
> > > > Yishun
> > > >
> > > > On Wed, Oct 3, 2018, 2:30 PM Yishun Guan  wrote:
> > > >>
> > > >> Hi All,
> > > >>
> > > >> I want to start a voting on this KIP:
> > > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93325308
> > > >>
> > > >> Here is the discussion thread:
> > > >>
> > https://lists.apache.org/thread.html/9f6394c28d3d11a67600d5d7001e8aaa318f1ad497b50645654bbe3f@%3Cdev.kafka.apache.org%3E
> > > >>
> > > >> Thanks,
> > > >> Yishun
> >

Re: KAFKA-6654 custom SSLContext

2018-10-15 Thread Colin McCabe

In general Kafka makes an effort to be langauge-neutral so that Kafka clients 
can be implemented on platforms other than Java.  For example, we have things 
like librdkafka which allow people to access Kafka from C or Python.  Unless 
I'm misunderstanding something, giving direct access to the SSLContext and 
SSLSocketFactory seems like it would make that kind of compatibility harder, if 
it were even still possible.  I'm curious if there's a way to do this by adding 
configuration entries for what you need?

best,
Colin


On Mon, Oct 15, 2018, at 13:20, Pellerin, Clement wrote:
> I am new to this mailing list. I am not sure what I should do next. 
> Should I create a KIP to discuss this?
> 
> -Original Message-
> From: Pellerin, Clement 
> Sent: Wednesday, October 10, 2018 4:38 PM
> To: dev@kafka.apache.org
> Subject: KAFKA-6654 custom SSLContext
> 
> KAFKA-6654 correctly states that there will never be enough 
> configuration parameters to fully configure the SSLContext/
> SSLSocketFactory created by Kafka.
> For example, in our case, we need an alias to choose the key in the 
> keystore, and we need an implementation of OCSP.
> KAFKA-6654 suggests to make the creation of the SSLContext a pluggable 
> implementation.
> Maybe by declaring an interface and passing the name of an 
> implementation class in a new parameter.
> 
> Many libraries solve this problem by accepting the SSLContextFactory 
> instance from the application.
> How about passing the instance as the value of a runtime configuration 
> parameter?
> If that parameter is set, all other ssl.* parameters would be ignored.
> Obviously, this parameter could only be set programmatically.
> 
> I would like to hear the proposed solution by the Kafka maintainers.
> 
> I can help implementing a patch if there is an agreement on the desired 
> solution.

Re: KAFKA-6654 custom SSLContext

2018-10-15 Thread Colin McCabe

Hi Clement,

Thanks for the clarification.  Perhaps a pluggable interface makes sense here.  
Maybe someone more familiar with the SSL code can comment.

best,
Colin


On Mon, Oct 15, 2018, at 19:53, Pellerin, Clement wrote:
> OK, I can see why passing an instance is not language neutral.
> All the libraries I can think of accept the SSLSocketFactory, but they 
> most likely don't support C or Python.
> 
> My exact use case is to reuse the SSLContext configured in my 
> application outside Kafka.
> I'm afraid no amount of extra configuration properties can achieve that.
> It appears the creator of KAFKA-6654 agrees with me.
> 
> I could solve my problem if I could convince SslChannelBuilder to create 
> my own SslFactory implementation.
> The Kafka config already contains properties that hold class names.
> Like I suggested before, we could have a property for the class name 
> that implements an SslFactory interface.
> I would also need to pass custom config parameters to my SslFactory 
> implementation without causing warnings.
> By default, the SslFactory implementation would be the one built into 
> Kafka which uses all the Kafka ssl properties.
> 
> Is that acceptable to resolve KAFKA-6654?
> Can you think of a better solution?
> 
> 
> -Original Message-
> From: Colin McCabe [mailto:cmcc...@apache.org] 
> Sent: Monday, October 15, 2018 7:58 PM
> To: dev@kafka.apache.org
> Subject: Re: KAFKA-6654 custom SSLContext
> 
> In general Kafka makes an effort to be langauge-neutral so that Kafka 
> clients can be implemented on platforms other than Java.  For example, 
> we have things like librdkafka which allow people to access Kafka from C 
> or Python.  Unless I'm misunderstanding something, giving direct access 
> to the SSLContext and SSLSocketFactory seems like it would make that 
> kind of compatibility harder, if it were even still possible.  I'm 
> curious if there's a way to do this by adding configuration entries for 
> what you need?
> 
> best,
> Colin
> 
> 
> On Mon, Oct 15, 2018, at 13:20, Pellerin, Clement wrote:
> > I am new to this mailing list. I am not sure what I should do next. 
> > Should I create a KIP to discuss this?
> > 
> > -Original Message-
> > From: Pellerin, Clement 
> > Sent: Wednesday, October 10, 2018 4:38 PM
> > To: dev@kafka.apache.org
> > Subject: KAFKA-6654 custom SSLContext
> > 
> > KAFKA-6654 correctly states that there will never be enough 
> > configuration parameters to fully configure the SSLContext/
> > SSLSocketFactory created by Kafka.
> > For example, in our case, we need an alias to choose the key in the 
> > keystore, and we need an implementation of OCSP.
> > KAFKA-6654 suggests to make the creation of the SSLContext a pluggable 
> > implementation.
> > Maybe by declaring an interface and passing the name of an 
> > implementation class in a new parameter.
> > 
> > Many libraries solve this problem by accepting the SSLContextFactory 
> > instance from the application.
> > How about passing the instance as the value of a runtime configuration 
> > parameter?
> > If that parameter is set, all other ssl.* parameters would be ignored.
> > Obviously, this parameter could only be set programmatically.
> > 
> > I would like to hear the proposed solution by the Kafka maintainers.
> > 
> > I can help implementing a patch if there is an agreement on the desired 
> > solution.

Re: [DISCUSS] KIP-377: TopicCommand to use AdminClient

2018-10-20 Thread Colin McCabe

Hi Viktor,

Sounds good.  If you want to propose a way of improving the metadata protocol 
so that "[deleted]" could be supported, you could probably create that KIP in 
parallel.

The last KIP in that area that I can remember is KIP-142, which didn't get 
adopted (yet?)

https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster

There have been other discussions though.  In general there are a lot of 
features that would be nice to have in the metadata protocol (pagniation, 
regexes, skip stuff we don't need).

best,
Colin


On Tue, Oct 16, 2018, at 10:11, Viktor Somogyi-Vass wrote:
> Hi Colin,
> 
> Thanks, it makes sense and simplifies this KIP tremendously. I'll move this
> section to the rejected alternatives with a note that KIP-142 will have
> this feature.
> On a similar note: is there a KIP for describe topics protocol or have you
> been thinking about it? I guess there it's the same problem, we often don't
> want to forward the entire metadata.
> 
> Viktor
> 
> On Fri, Oct 12, 2018 at 12:03 PM Colin McCabe  wrote:
> 
> > Hi Viktor,
> >
> > Thanks for bumping this thread.
> >
> > I think we should just focus on transitioning the TopicCommand to use
> > AdminClient, and talk about protocol changes in a separate KIP.  Protocol
> > changes often involve a lot of discussion.  This does mean that we couldn't
> > implement the "list topics under deletion" feature when using AdminClient
> > at the moment.  We could add a note to the tool output indicating this.
> >
> > We should move the protocol discussion to a separate thread.  Probably
> > also look at KIP-142 as well.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Oct 9, 2018, at 07:45, Viktor Somogyi-Vass wrote:
> > > Hi All,
> > >
> > > Would like to bump this as the conversation sank a little bit, but more
> > > importantly I'd like to validate my plans/ideas on extending the Metadata
> > > protocol. I was thinking about two other alternatives, namely:
> > > 1. Create a ListTopicUnderDeletion protocol. This however would be
> > > unnecessary: it'd have one very narrow functionality which we can't
> > extend.
> > > I'd make sense to have a list topics or describe topics protocol where we
> > > can list/describe topics under deletion but for normal listing/describing
> > > we already use the metadata, so it would be a duplication of
> > functionality.
> > > 2. DeleteTopicsResponse could return the topics under deletion if the
> > > request's argument list is empty which might make sense at the first
> > look,
> > > but actually we'd mix the query functionality with the delete
> > functionality
> > > which is counterintuitive.
> > >
> > > Even though most clients won't need these "limbo" topics (which are under
> > > deletion) in the foreseeable future, it can be considered as part of the
> > > cluster state or metadata and to me it makes sense. Also it doesn't have
> > a
> > > big overhead in the response size as typically users don't delete topics
> > > too often as far as I experienced.
> > >
> > > I'd be happy to receive some ideas/feedback on this.
> > >
> > > Cheers,
> > > Viktor
> > >
> > >
> > > On Fri, Sep 28, 2018 at 4:51 PM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I made an update to the KIP. Just in short:
> > > > Currently KafkaAdminClient.describeTopics() and
> > > > KafkaAdminClient.listTopics() uses the Metadata protocol to acquire
> > topic
> > > > information. The returned response however won't contain the topics
> > that
> > > > are under deletion but couldn't complete yet (for instance because of
> > some
> > > > replicas offline), therefore it is not possible to implement the
> > current
> > > > command's "marked for deletion" feature. To get around this I
> > introduced
> > > > some changes in the Metadata protocol.
> > > >
> > > > Thanks,
> > > > Viktor
> > > >
> > > > On Fri, Sep 28, 2018 at 4:48 PM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com> wrote:
> > > >
> > > >> Hi Mickael,
> > > >>
> > > >> Thanks for the feedback, I a

Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-21 Thread Colin McCabe

On Thu, Oct 18, 2018, at 09:23, n...@afshartous.com wrote:
> 
> 
> > On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
> > 
> > Maybe there's some really cool use-case that I haven't thought of.  But so 
> > far I can't really think of any time I would need topic priorities if I was 
> > muting topics and offloading blocking operations in a reasonable way.  It 
> > would be good to identify use-cases 
> 
> 
> Hi Colin,
> 
> How about the use-case where there are multiple streams/topics, and the 
> intent is to have a single consumer interleave the messages so that 
> higher priority messages are processed first ?
> That seems to be what the reporter of the associated Jira ticket
> 
>https://issues.apache.org/jira/browse/KAFKA-6690 
> <https://issues.apache.org/jira/browse/KAFKA-6690>
> 
> has identified as a use-case he frequently encounters.  I’ve asked him 
> to elaborate on the dev list though he has not responded yet.
> 
> Best,
> --
>   Nick

Thanks, Nick.  It will be interesting to hear more about that.

best,
Colin

> 
> 
>

Re: Throwing away prefetched records optimisation.

2018-10-21 Thread Colin McCabe

Ryanne Dolan  wrote:
> It sounds to me like this problem is due to Akka attempting to implement
> additional backpressure on top of the Consumer API. I'd suggest they not do
> that, and then this problem goes away.

Imagine a very simple case where you want to consume from three partitions at 
about the same rate, but the messages in those partitions have different 
average sizes.  You can't keep the consumption rate the same without 
occassionally pausing the fast partitions.

I think we should encourage people to use pause and resume when it's 
appropriate.  After all, Kafka Streams uses pause and resume, for much the same 
reasons.  It's demonstrably a simple and useful API.  There's nothing wrong 
with frameworks implementing backpressure.

Jan Filipiak  wrote:
 > The idea for you would be that Messagechooser could hang on to the 
 > prefetched messages.
 > 
 > ccing cmcc...@apache.org
 > 
 > @Collin
 > just for you to see that MessageChooser is a powerfull abstraction.

Samza has a lot of powerful abstractions.  However, in this case, you can get 
everything you need just by using pause and resume on partitions.  And then 
Zahari can implement some powerful abstractions of his own, in Akka Streams. :)

Matthias J. Sax  wrote:
> Heartbeats are send by a backgroud thread since 0.10.0.0. And there is a
> second configuration `max.poll.interval.ms` that you can increase for a
> slow consumer. Kafka Streams sets it to `Integer.MAX_VALUE` for example.

Right, that's a very important point.  Heartbeats are sent by a background 
thread now.

Zahari Dichev  wrote:
> Just a mall thing to mention though, blocking on
> the poll forever is not something that I am convinced we would like to do.

There is a KafkaConsumer#wakeup method which might help here.
> * Wakeup the consumer. This method is thread-safe and is useful in 
> particular to abort a long poll.

best,
Colin


On Sun, Oct 21, 2018, at 14:59, Matthias J. Sax wrote:
> It's spread out multiple classes...
> 
> Good starting point is here:
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L806
> 
> It implements the main-loop that polls, addRecordsToTasks() (ie, put the
> into buffers), and processes record.
> 
> pause/resume is done here:
> 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L720
> 
> and
> 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L362
> 
> 
> Not blocking forever makes sense of course. I did not mean in literally.
> However, longer poll() blocking times, should allow you to drain the
> consumer buffer better.
> 
> 
> -Matthias
> 
> On 10/21/18 12:54 PM, Zahari Dichev wrote:
> > Thanks for your feedback Matthias, Do you think you can point me to the
> > part where Kafka streams deals with all of that so I can take a look. Will
> > try and see whether your suggested approach works for us before trying to
> > argue my point further. Just a mall thing to mention though, blocking on
> > the poll forever is not something that I am convinced we would like to do.
> > 
> > Zahari
> > 
> > On Sun, Oct 21, 2018 at 10:11 PM Matthias J. Sax 
> > wrote:
> > 
> >>> You have one consumer that is quite
> >>>> slow so lets say it call poll every 5 seconds, while you need to call
> >> poll
> >>>> every 1 second to issue a heartbeat (these are made up numbers of
> >> course).
> >>
> >> Heartbeats are send by a backgroud thread since 0.10.0.0. And there is a
> >> second configuration `max.poll.interval.ms` that you can increase for a
> >> slow consumer. Kafka Streams sets it to `Integer.MAX_VALUE` for example.
> >>
> >> For the remainder, I agree, that using a buffer on top might not a
> >> perfect solution. That is, what I meant by:
> >>
> >>> (not 100% percent guarantee,
> >>> depending on fetch size and max.poll.record etc, but with high
> >>> probability)
> >>
> >> However, I believe, that if you configure the consumer accordingly, you
> >> can to drain the fetch buffer if you block on `poll()` forever.
> >>
> >> I have to admit, than I am not familiar with the details of pipelining
> >> fetch requests thought. The general idea is, to make sure to drain the
> >> internal buffer of the consumer, before you call `pause()`.
> >>
> >> Curious to to learn why this would not work? How does the pipelining of
> >> fetch requests works in d

Re: Throwing away prefetched records optimisation.

2018-10-23 Thread Colin McCabe

On Tue, Oct 23, 2018, at 12:38, Zahari Dichev wrote:
> Hi there Matthias, I looked through the code of Kafka Streams. Quite
> impressive work ! If I have to put the logic of buffering within the
> context of what we are doing in Akka though, I might end up with the
> following situation.
> 
> 1. Poll is called with two partition being active *TP1, TP2*
> 2. We get some data for both, both of them also prefetch some data.
> 3. So now we have some data that we have obtained and some data that sits
> with the buffer of the fetcher, waiting to be obtained.
> 4. We put the data that we have obtained from the poll into the respective
> buffers of the partitions.
> 5. Since both of our buffers are "full", we call pause on both *TP1* and
> *TP2*.
> 6. A little time has passed and the client of *TP1* has processed all its
> records from the buffer, while the one of *TP2* has processed none
> 7. Buffer of *TP1* gets empty, we call resume on *TP1*
> 8. We call poll again with *TP1* resumed and *TP2* paused.
> 9. We get some records for TP1 and we throw away all the records that were
> prefetched for *TP2* in step 2
> 
> This can go on and on and due to the dynamic nature of the speed of
> processing records and the theoretically unlimited number of topic
> partitions, I find it possible that this scenario can happen more than once
> over the lifetime of a client. And instead of trying to calculate the
> probability of this happening and attempt to minimise it, I would prefer to
> have one of two options:
> 
> 1. Having control to allow me to enable the returning of already prefetched
> data, and simply store it in a buffer of my own until I have enough
> capacity to deal with it
> 
> OR
> 
> 2. Keep the data in the fetcher and not throw it away but use it on the
> next poll (not sure how viable that is as I have not looked at the details
> of it all).

I haven't thought about it that hard, but it sounds like the second option 
might be better.  I have a hard time thinking of a case where we actually want 
to throw away data for paused partitions.  If you're still subscribed to it, 
presumably you'll eventually unpause it and use the cache, right?  It makes 
sense for unsubscribe to clear those records, but not pause, as far as I can 
see.

best,
Colin


> 
> The first option is what I suggested initially and the second option is the
> one that will allow us to skip the introduction of a configuration
> parameter as Colin suggested. These are the things I can suggest at the
> moment. As mentioned, I am willing to carry out the work. There is also an
> official discussion thread, but I guess we have deviated from that, so I
> can just put that current on in JIRA instead if that is OK ?
> 
> Matthias, regarding how the fetcher works. From what I have looked at,
> whenever the consumer polls and returns some data, we immediately issue
> another fetch request that delivered us records that are returned on the
> next poll. All these fetched records, that have not made it to the caller
> of poll but have been fetched are thrown away in case at the time of the
> nest poll() the partition is in paused state. This is what is causing the
> inefficiency.
> 
> Any more comments are welcome.
> 
> On Mon, Oct 22, 2018 at 6:00 AM Ismael Juma  wrote:
> 
> > Hi,
> >
> > I think a KIP to discuss a concrete proposal makes sense. One suggestion is
> > to explore the possibility of fixing the issue without a new config. Would
> > that break existing users? Generally, we should strive for avoiding configs
> > if at all possible.
> >
> > Ismael
> >
> > On 16 Oct 2018 12:30 am, "Zahari Dichev"  wrote:
> >
> > Hi there Kafka developers,
> >
> > I am currently trying to find a solution to an issue that has been
> > manifesting itself in the Akka streams implementation of the Kafka
> > connector. When it comes to consuming messages, the implementation relies
> > heavily on the fact that we can pause and resume partitions. In some
> > situations when a single consumer instance is shared among several streams,
> > we might end up with frequently pausing and unpausing a set of topic
> > partitions, which is the main facility that allows us to implement back
> > pressure. This however has certain disadvantages, especially when there are
> > two consumers that differ in terms of processing speed.
> >
> > To articulate the issue more clearly, imagine that a consumer maintains
> > assignments for two topic partitions *TP1* and *TP2*. This consumer is
> > shared by two streams - S1 and S2. So effectively when we have demand from
> > only one of the streams - *S1*, we will pause one of t

Re: Throwing away prefetched records optimisation.

2018-10-24 Thread Colin McCabe

In general, the official discussion thread for a KIP starts after the KIP is 
written and posted.  So you would typically start a new email thread with a 
[DISCUSS] string in the title somewhere.  You can certainly link back to this 
email thread if you want, though, since it has some useful context for 
everything.

best,
Colin


On Tue, Oct 23, 2018, at 23:21, Zahari Dichev wrote:
> Colin, I agree
> 
> I will take a closer looks at the Fetcher itself and see whether that is
> feasible and update my KIP accordingly. I guess we can label this one, the
> official discussion thread for it or should I start another one ?
> 
> Zahari
> 
> On Wed, Oct 24, 2018 at 6:43 AM Colin McCabe  wrote:
> 
> > On Tue, Oct 23, 2018, at 12:38, Zahari Dichev wrote:
> > > Hi there Matthias, I looked through the code of Kafka Streams. Quite
> > > impressive work ! If I have to put the logic of buffering within the
> > > context of what we are doing in Akka though, I might end up with the
> > > following situation.
> > >
> > > 1. Poll is called with two partition being active *TP1, TP2*
> > > 2. We get some data for both, both of them also prefetch some data.
> > > 3. So now we have some data that we have obtained and some data that sits
> > > with the buffer of the fetcher, waiting to be obtained.
> > > 4. We put the data that we have obtained from the poll into the
> > respective
> > > buffers of the partitions.
> > > 5. Since both of our buffers are "full", we call pause on both *TP1* and
> > > *TP2*.
> > > 6. A little time has passed and the client of *TP1* has processed all its
> > > records from the buffer, while the one of *TP2* has processed none
> > > 7. Buffer of *TP1* gets empty, we call resume on *TP1*
> > > 8. We call poll again with *TP1* resumed and *TP2* paused.
> > > 9. We get some records for TP1 and we throw away all the records that
> > were
> > > prefetched for *TP2* in step 2
> > >
> > > This can go on and on and due to the dynamic nature of the speed of
> > > processing records and the theoretically unlimited number of topic
> > > partitions, I find it possible that this scenario can happen more than
> > once
> > > over the lifetime of a client. And instead of trying to calculate the
> > > probability of this happening and attempt to minimise it, I would prefer
> > to
> > > have one of two options:
> > >
> > > 1. Having control to allow me to enable the returning of already
> > prefetched
> > > data, and simply store it in a buffer of my own until I have enough
> > > capacity to deal with it
> > >
> > > OR
> > >
> > > 2. Keep the data in the fetcher and not throw it away but use it on the
> > > next poll (not sure how viable that is as I have not looked at the
> > details
> > > of it all).
> >
> > I haven't thought about it that hard, but it sounds like the second option
> > might be better.  I have a hard time thinking of a case where we actually
> > want to throw away data for paused partitions.  If you're still subscribed
> > to it, presumably you'll eventually unpause it and use the cache, right?
> > It makes sense for unsubscribe to clear those records, but not pause, as
> > far as I can see.
> >
> > best,
> > Colin
> >
> >
> > >
> > > The first option is what I suggested initially and the second option is
> > the
> > > one that will allow us to skip the introduction of a configuration
> > > parameter as Colin suggested. These are the things I can suggest at the
> > > moment. As mentioned, I am willing to carry out the work. There is also
> > an
> > > official discussion thread, but I guess we have deviated from that, so I
> > > can just put that current on in JIRA instead if that is OK ?
> > >
> > > Matthias, regarding how the fetcher works. From what I have looked at,
> > > whenever the consumer polls and returns some data, we immediately issue
> > > another fetch request that delivered us records that are returned on the
> > > next poll. All these fetched records, that have not made it to the caller
> > > of poll but have been fetched are thrown away in case at the time of the
> > > nest poll() the partition is in paused state. This is what is causing the
> > > inefficiency.
> > >
> > > Any more comments are welcome.
> > >
> > > On Mon, Oct 22, 2018 at 6:00 AM Ismael Juma  wrote:
> > >
> > > > Hi,
> > > &

Re: [VOTE] KIP-377: TopicCommand to use AdminClient

2018-10-24 Thread Colin McCabe

Thanks, Viktor.  +1 (binding).

One note: can we add a deprecation warning when --zookeeper is used, to 
indicate that this option will be phased out in the future?

best,
Colin

On Wed, Oct 24, 2018, at 05:47, Mickael Maison wrote:
> +1 (non-binding)
> Thanks for the KIP!
> On Wed, Oct 24, 2018 at 1:28 PM Viktor Somogyi-Vass
>  wrote:
> >
> > Hi All,
> >
> > I'd like to start a vote on KIP-377:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-377%3A+TopicCommand+to+use+AdminClient.
> >
> > Summary:
> > The KIP basically proposes to add --bootstrap-server and
> > --command-config option to TopicsCommand and implement topic
> > administration with AdminClient in a backwards compatible way (so
> > wouldn't drop or change the --zookeeper option usage).
> >
> > I'd appreciate any votes or feedback.
> >
> > Viktor

Re: [DISCUSS] KIP-385: Provide configuration allowing consumer to no throw away prefetched data

2018-10-25 Thread Colin McCabe

Hi Zahari,

One question we didn't figure out earlier was who would actually want this 
cached data to be thrown away.  If there's nobody who actually wants this, then 
perhaps we can simplify the proposal by just unconditionally retaining the 
cache until the partition is resumed, or we unsubscribe from the partition.  
This would avoid adding a new configuration.

best,
Colin

On Sun, Oct 21, 2018, at 11:54, Zahari Dichev wrote:
> Hi there, although it has been discussed briefly already in this thread
> <https://lists.apache.org/thread.html/fbb7e9ccc41084fc2ff8612e6edf307fb400f806126b644d383b4a64@%3Cdev.kafka.apache.org%3E>,
> I decided to follow the process and initiate a DISCUSS thread. Comments 
> and
> suggestions are more than welcome.
> 
> 
> Zahari Dichev

Re: [DISCUSS] KIP-385: Provide configuration allowing consumer to no throw away prefetched data

2018-10-26 Thread Colin McCabe

Hi Zahari,

I think we can retire the KIP, since the KAFKA-7548 patch should solve the 
issue without any changes that require a KIP.  This is actually the best thing 
we could do for our users, since things will "just work" more efficiently 
without a lot of configuration knobs.

I think you did an excellent job raising this issue and discussing it.  It's a 
very good contribution to the project even if you don't end up writing the 
patch yourself.  I'm going to take a look at the patch today.  If you want to 
take a look, that would also be good.

best,
Colin


On Thu, Oct 25, 2018, at 12:25, Zahari Dichev wrote:
> Hi there Mayuresh,
> 
> Great to heat that this is actually working well in production for some
> time now. I have changed the details of the KIP to reflect the fact that as
> already discussed - we do not really need any kind of configuration as this
> data should not be thrown away at all.  Submitting a PR sounds great,
> although I feel a bit jealous you (LinkedIn) beat me to my first kafka
> commit  ;)  Not sure how things stand with the voting process ?
> 
> Zahari
> 
> 
> 
> On Thu, Oct 25, 2018 at 7:39 PM Mayuresh Gharat 
> wrote:
> 
> > Hi Colin/Zahari,
> >
> > I have created a ticket for the similar/same feature :
> > https://issues.apache.org/jira/browse/KAFKA-7548
> > We (Linkedin) had a use case in Samza at Linkedin when they moved from the
> > SimpleConsumer to KafkaConsumer and they wanted to do this pause and resume
> > pattern.
> > They realized there was performance degradation when they started using
> > KafkaConsumer.assign() and pausing and unPausing partitions. We realized
> > that not throwing away the prefetched data for paused partitions might
> > improve the performance. We wrote a benchmark (I can share it if needed) to
> > prove this. I have attached the findings in the ticket.
> > We have been running the hotfix internally for quite a while now. When
> > samza ran this fix in production, they realized 30% improvement in there
> > app performance.
> > I have the patch ready on our internal branch and would like to submit a PR
> > for this on the above ticket asap.
> > I am not sure, if we need a separate config for this as we haven't seen a
> > lot of memory overhead due to this in our systems. We have had this running
> > in production for a considerable amount of time without any issues.
> > It would be great if you guys can review the PR once its up and see if that
> > satisfies your requirement. If it doesn't then we can think more on the
> > config driven approach.
> > Thoughts??
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Thu, Oct 25, 2018 at 8:21 AM Colin McCabe  wrote:
> >
> > > Hi Zahari,
> > >
> > > One question we didn't figure out earlier was who would actually want
> > this
> > > cached data to be thrown away.  If there's nobody who actually wants
> > this,
> > > then perhaps we can simplify the proposal by just unconditionally
> > retaining
> > > the cache until the partition is resumed, or we unsubscribe from the
> > > partition.  This would avoid adding a new configuration.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Sun, Oct 21, 2018, at 11:54, Zahari Dichev wrote:
> > > > Hi there, although it has been discussed briefly already in this thread
> > > > <
> > >
> > https://lists.apache.org/thread.html/fbb7e9ccc41084fc2ff8612e6edf307fb400f806126b644d383b4a64@%3Cdev.kafka.apache.org%3E
> > > >,
> > > > I decided to follow the process and initiate a DISCUSS thread. Comments
> > > > and
> > > > suggestions are more than welcome.
> > > >
> > > >
> > > > Zahari Dichev
> > >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >

Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-26 Thread Colin McCabe

On Thu, Oct 25, 2018, at 18:16, n...@afshartous.com wrote:
> 
> The reporter of KAFKA-6690 (Bala) replied in the JIra ticket to my 
> question to elaborate about his use-case.  I don’t think he’s on the dev 
> list.  Here’s his response:  
> 
>   Bala:  Sorry about the delay in reply. We use Kafka to process the 
> asynchronous events of our Document Management System such as preview 
> generation, indexing for search etc. The traffic gets generated via Web 
> and Desktop Sync application. In such cases, we had to prioritize the 
> traffic from web and consume them first. But this might lead to the 
> starvation of events from sync if the consumer speed is slow and the 
> event rate is high from web. A solution to handle the starvation with a 
> timeout after which the events are consumed normally for a specified 
> period of time would be great and help us use our resources effectively.

Priorities won't help for this use-case, right?  If the "web" partition has a 
higher priority, and data is always available, there will *never* be any events 
reported for "sync".  Priorities don't prevent starvation-- they cause 
starvation by design, because the high priority partition always takes priority.

In general the best solution would probably be to have a work queue between the 
consumer and the event handler, and manage the backpressure as appropriate.  
This could be done with pause and resume, as Streams does.

best,
Colin


> 
> --
>   Nick
> 
> 
> 
> 
> > On Oct 18, 2018, at 12:23 PM, n...@afshartous.com wrote:
> > 
> >> On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
> >> 
> >> Maybe there's some really cool use-case that I haven't thought of.  But so 
> >> far I can't really think of any time I would need topic priorities if I 
> >> was muting topics and offloading blocking operations in a reasonable way.  
> >> It would be good to identify use-cases 
> > 
> > 
> > Hi Colin,
> > 
> > How about the use-case where there are multiple streams/topics, and the 
> > intent is to have a single consumer interleave the messages so that higher 
> > priority messages are processed first ?
> > That seems to be what the reporter of the associated Jira ticket
> > 
> >   https://issues.apache.org/jira/browse/KAFKA-6690 
> > <https://issues.apache.org/jira/browse/KAFKA-6690>
> > 
> > has identified as a use-case he frequently encounters.  I’ve asked him to 
> > elaborate on the dev list though he has not responded yet.
> > 
> > Best,
> > --
> >  Nick
> > 
> > 
> > 
> 
> 
> 
> 
>

Re: Kafka-node: Kafka client keeps sending request to broker which went down

2018-10-26 Thread Colin McCabe

Hi Shashank,

kafka-node is not developed by Apache or the Apache Kafka project.  I don't 
think anyone here has looked at that code.

I have heard reports from the field that kafka-node has some very serious bugs, 
such as not retrying failed requests at all in some circumstances.  I do not 
recommend using this client.

https://github.com/Blizzard/node-rdkafka is a better choice.  Keep in mind that 
node-rdkafka is still not part of the Apache Kafka project, so you might want 
to discuss any issues with that particular project.

best,
Colin


On Tue, Oct 23, 2018, at 22:04, Shashank Sah wrote:
> When a broker host goes down/restarts, the kafka client keeps sending the
> new requests to the same hosts address. As a result, the requests are
> failing with this error: Request timed out after 3ms.
> 
> Node version: v6.8.1
> Kafka-node version: 3.0.1
> Kafka version: 2.11-2.0.0
> Number of Brokers: 3
> Number partitions for topic: 10
> 
> Some code pointers:
> 
> "clusterConfig" : {
>   "kafkaHost": "localhost:9092,localhost:9093,localhost:9094",
>   "autoConnect": true
> }
> ...
> let kafkaClient = new kafka.KafkaClient(clusterConfig);
> producer = new kafka.HighLevelProducer(kafkaClient,
> cluster.producerConfig);
> Promise.promisifyAll(producer);
> ...
> producer.sendAsync([eventPayload])
>   .then(function (data) {
> let topicName = eventPayload.topic;
> let payLoadSize = (eventPayload || '').length;
> logger.eventInfo(topicName, payLoadSize, source);
>   })
>   .catch(function (e) {
> logger.produceFailedEvent(eventPayload, source);
> throw Error.getErrorObject(errorType, e, topic, source);
>   });
> 
> I have the kept the other configurations to default.
> It seems there is some issue with kafka-node library, given below are the
> logs with the corresponding hosts they are going to connect. The first call
> was successful, second failed and third was successful. In the below case
> localhost:9092 was down.
> 
> kafka-node:KafkaClient kafka-node-client reconnecting to localhost:9092 +1s
> kafka-node:KafkaClient compressing messages if needed +76ms
> shashanksah actual broker![BrokerWrapper localhost:9094 (connected:
> true) (ready: true) (idle: false) (needAuthentication: false)
> (authenticated: false)]
> kafka-node:KafkaClient kafka-node-client reconnecting to localhost:9092
> +928ms
> kafka-node:KafkaClient compressing messages if needed +422ms
> shashanksah actual broker![BrokerWrapper localhost:9092 (connected:
> false) (ready: false) (idle: false) (needAuthentication: false)
> (authenticated: false)]
> kafka-node:KafkaClient missing apiSupport waiting until broker is ready...
> +1ms
> kafka-node:KafkaClient kafka-node-client reconnecting to localhost:9092
> +583ms
> kafka-node:KafkaClient compressing messages if needed +280ms
> shashanksah actual broker![BrokerWrapper localhost:9093 (connected:
> true) (ready: true) (idle: false) (needAuthentication: false)
> (authenticated: false)]
> kafka-node:KafkaClient kafka-node-client reconnecting to localhost:9092
> +723ms
> 
> The issue is that it gets the ip of host which is down and then waits on
> that host to be ready (for 30secs) and then fails the sent request. If we
> change the function ensureBrokerReady in this file:
> https://github.com/SOHU-Co/kafka-node/blob/master/lib/kafkaClient.js#L1016
> to something like given below, then this issue does not occurs:
> 
>   const ensureBrokerReady = async.ensureAsync((leader, callback) => {
> let broker = this.brokerForLeader(leader, longpolling);
> console.log("shashanksah actual broker!" + broker);
> if (!broker.isReady()) {
>   this.refreshBrokerMetadata();
>   broker = this.brokerForLeader(leader, longpolling);
>   //console.log("shashanksah broker not ready so refresh and retry!");
> }
> if (!broker.isReady()) {
>   //console.log("shashanksah !broker.isReady");
>   logger.debug('missing apiSupport waiting until broker is ready...');
>   this.waitUntilReady(broker, callback);
> } else {
>   callback(null);
> }
>   });
> 
> Please tell if I am missing anything or the RCA is correct.
> 
> Thanks,
> 
> Shashank

Re: [DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

2018-11-02 Thread Colin McCabe

On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> Hi all,
> 
> I am a software engineer from Zhihu.com. Kafka is so great and used heavily
> in Zhihu. There are probably over 2K Kafka brokers in total.
> 
> However, we are suffering from the problem that the performance degrades
> rapidly when the number of topics increases(sadly, we are using HDD).

Hi Yuanjin,

How many partitions are you trying to create?

Do you have benchmarks confirming that disk I/O is your bottleneck?  There are 
a few cases where large numbers of partitions may impose CPU and garbage 
collection burdens.  The patch on https://github.com/apache/kafka/pull/5206 
illustrates one of them.

> We are considering separating the logic layer and the storage layer of Kafka
> broker like Apache Pulsar.
> 
> After the modification, a server may have several Kafka brokers and more
> topics. Those brokers all connect to a sole storage engine via RP The
> sole storage can do the load balancing work easily, and avoid creating too
> many files which hurts HDD.
> 
> Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> right?

It would help to know what the problem is here.  If the problem is a large 
number of files, then maybe the simplest approach would be creating fewer 
files.  You don't need to introduce a new layer of servers in order to do that. 
 You could use something like RocksDB to store messages and indices, or create 
your own file format which combined together things which were previously 
separate.  For example, we could combine the timeindex and index files.

As I understand it, Pulsar made the decision to combine together data from 
multiple partitions in a single file.  Sometimes a very large number of 
partitions.  This is great for writing, but not so good if you want to read 
historical data from a single topic.

regards,
Colin

Re: [VOTE] KIP-374: Add '--help' option to all available Kafka CLI commands

2018-11-08 Thread Colin McCabe

+1 (binding)



On Wed, Oct 31, 2018, at 05:42, Srinivas Reddy wrote:
> Hi All,
>
> I would like to call for a vote on KIP-374:
> https://cwiki.apache.org/confluence/x/FgSQBQ
>
> Summary:
> Currently, the '--help' option is recognized by some Kafka commands
> but not all. To provide a consistent user experience, it would
> be nice to> add a '--help' option to all Kafka commands.
>
> I'd appreciate any votes or feedback.
>
> --
> Srinivas Reddy
>
> http://mrsrinivas.com/
>
>
> (Sent via gmail web)

Re: Kafka Performance Producer - Extension

2018-11-08 Thread Colin McCabe

On Thu, Nov 8, 2018, at 01:38, Srinivas, Kaushik (Nokia - IN/Bangalore) wrote:> 
Hi All,


>  


> This is in regard to extend few features support for java kafka
> performance producer.>  



Hi Kaushik,

Have you created a pull request against the Kafka repo at
https://github.com/apache/kafka/ ?
I also suggest checking out Trogdor, which has a suite of many workloads
and stress tests that can be run against Kafka.  See
https://github.com/apache/kafka/blob/trunk/TROGDOR.md
> We have a kafka producer application written in java with maven build.> 
> Attaching the documentation which gives detailed description of
> features supported by this application as of now.>  


> Would like to know if this can be integrated into kafka performance
> producer java implementation which is present with limited features.> Kindly 
> review the attached document, and help with feedback on how
> well this could be integrated.
I'm sorry, but I do not open MS Word attachments because of
security issues.
best,
Colin

> 
>  
> These features would help any team load testing kafka brokers/schema
> registry components efficiently.>  


> Best Regards,


> -kaushik


> Email had 1 attachment:


>  * KAFKA_PRODUCER_FEATURES_DOC.docx 22k (application/vnd.openxmlformats-
>officedocument.wordprocessingml.document)

Re: [DISCUSS] KIP-388 Add observer interface to record request and response

2018-11-08 Thread Colin McCabe

Hi Lincong Li,

I agree that server-side instrumentation is helpful.  However, I don't think 
this is the right approach.

The problem is that RequestChannel.Request and AbstractResponse are internal 
classes that should not be exposed.  These are implementation details that we 
may change in the future.  Freezing these into a public API would really hold 
back the project.  For example, for really large responses, we might eventually 
want to avoid materializing the whole response all at once.  It would make more 
sense to return it in a streaming fashion.  But if we need to support this API 
forever, we can't do that.

I think it's fair to say that this is, at best, half a solution to the problem 
of tracing requests.  Users still need to write the plugin code and arrange for 
it to be on their classpath to make this work.  I think the alternative here is 
not client-side instrumentation, but simply making the change to the broker 
without using a plugin interface.

If a public interface is absolutely necessary here we should expose only things 
like the API key, client ID, time, etc. that don't constrain the implementation 
a lot in the future.  I think we should also use java here to avoid the 
compatibility issues we have had with Scala APIs in the past.

best,
Colin

On Thu, Nov 8, 2018, at 11:34, radai wrote:
> another downside to client instrumentation (beyond the number of
> client codebases one would need to cover) is that in a large
> environments you'll have a very long tail of applications using older
> clients to upgrade - it would be a long and disruptive process (as
> opposed to updating broker-side instrumentation)
> On Thu, Nov 8, 2018 at 11:04 AM Peter M. Elias  wrote:
> >
> > I know we have a lot of use cases for this type of functionality at my
> > enterprise deployment. I think it's helpful for maintaining reliability of
> > the cluster especially and identifying clients that are not properly tuned
> > and therefore applying excessive load to the brokers. Additionally, there
> > is a bit of a dark spot without something like as currently. For example,
> > if a client is not using a consumer group, there is no direct way to query
> > the state of the consumer without looking at raw network connections to
> > determine the extent of the traffic generated by that particular consumer.
> >
> > While client instrumentation can certainly help with this currently, given
> > that Kafka is intended to be a shared service across a potentially very
> > large surface area of clients, central observation of client activity is in
> > my opinion an essential feature.
> >
> > Peter
> >
> > On Thu, Nov 8, 2018 at 12:13 PM radai  wrote:
> >
> > > bump.
> > >
> > > I think the proposed API (Observer) is useful for any sort of
> > > multi-tenant environment for chargeback and reporting purposes.
> > >
> > > if no one wants to comment, can we initiate a vote?
> > > On Mon, Nov 5, 2018 at 6:31 PM Lincong Li  wrote:
> > > >
> > > > Hi everyone. Here
> > > > <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-388%3A+Add+observer+interface+to+record+request+and+response
> > > >
> > > > is
> > > > my KIP. Any feedback is appreciated.
> > > >
> > > > Thanks,
> > > > Lincong Li
> > >

Re: Heads up: javac warnings are now treated as errors

2018-11-14 Thread Colin McCabe

Sounds good.  Thanks for cleaning up the warnings!

 Colin

On Mon, Nov 12, 2018, at 22:29, Ismael Juma wrote:
> Hi all,
>
> As part of KAFKA-7612, all javac warnings were fixed or suppressed. To> 
> prevent them from reappearing, javac warnings are now treated as
> errors. We> still have some scalac warnings (see KAFKA-7614 for details on 
> what's> needed to eliminate them) and 3 xlint warnings are not yet enabled
> (KAFKA-7613).
>
> Before merging PRs that were submitted before KAFKA-7612 was
> merged, it's a> good idea to rerun the PR tests.
>
> Ismael

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-11-16 Thread Colin McCabe

As Jason said, there are definitely scenarios where we know how many group 
members we expect ahead of time.  It would be nice if we could distinguish 
between the error case of "we expected 5 clients in the group, but one failed" 
and a case like "4 clients started up quickly but the 5th took an extra 2 
seconds."  We can sandbag the group rebalance delay, but that's a hack which 
has clear disadvantages.

It would also be nice to be able to detect when a group member left the group 
briefly but then came back.

I think both of these issues could be solved by having some broker-side 
metadata about groups which is configured through the admin client.  If there 
was an "expected group size," stored on the broker-side, then we could 
rebalance immediately whenever the group size reached that size.  Otherwise, we 
could apply the rebalance delay, like now.  This would give lower latency when 
setting things up.

Expected group size is just an expectation, so the group would be allowed to 
get bigger than that.  We could also have another number which was the maximum 
group size.  This really would be a hard upper limit on the size of the group, 
which admins could optionally configure.

When a new client joined a group, the server could send back a unique random 
64-bit member ID.  The client could hold on to this ID and use it whenever it 
rejoined the group after a failure.  Since the ID is random and provided by the 
server, it can't be spoofed or accidentally reused by a misconfigured client.

best,
Colin

On Fri, Nov 16, 2018, at 00:04, Jason Gustafson wrote:
> >
> > If we initialize a set of member names (I assume ids = names here) on
> > broker through Admin API, the client needs to pick up this information
> > simultaneously which I doubt if there is a generic way to achieve that? It
> > would also make the scaling operations difficult if we need to define the
> > member names every time we change the member set which is an extra
> > operation burden. From my daily ops experience, dynamically generate member
> > names on client side would be easier. Is there a good approach to address
> > this issue?
> 
> 
> Yeah, that's a good question. I'm hoping someone with more kubernetes
> experience will jump in here. Basically my goal is to have an approach
> which maps nicely to StatefulSets (
> https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/).
> The pods in a stateful set have an ordinal index, which sounds similar to
> the static ids that I was describing. You can scale up and down a stateful
> set, but you would need a plugin to grow and shrink the consumer group.
> Sounds like it could work, but I'm not sure if it's the best way.
> 
> At Pinterest we maintain 24-7 SLA for 15 - 30 minutes reaction to all the
> > critical streaming services abnormality. One of the burden was the night
> > shift which requires the oncaller to quickly resolve the issue and get the
> > streaming application back on track, however there is a chance of miss. My
> > concern was that if we forfeit the timeout on static membership to trigger
> > rebalance, missing some pages during midnight could be negatively
> > impacting the system performance since we may realize that some partitions
> > stop working for a couple of hours already until next morning. So
> > registration timeout serves as the "last line of defense" to guarantee
> > liveness if no human intervention jumps in.
> 
> 
> Thanks, this is helpful background. I agree this is a risk in the approach
> I've suggested. If we take a step back, I think there are two gaps in the
> protocol for stateful applications:
> 
> 1. We don't have a way to detect the same member across failures or
> restarts. I think streams has some heuristic to try and handle the common
> cases (such as rolling restarts), but the proposal here solves the problem
> in a more robust way.
> 
> 2. We don't have a way to know what the expected membership of the group
> is. This leads us to try tricks like inserting delays into the rebalance
> logic so that the group membership has time to stabilize before we make any
> decisions. In your proposal, we have an expansion timeout, which is
> basically the same thing as far as I can tell.
> 
> I think the first problem is the most important, but it would be nice if we
> can solve the second problem as well. If we have a way to indicate the
> expected group members, then the group can respond to a change much more
> quickly. There would be no need to wait 5 minutes for all members to join
> and it would be robust in the presence of failures. Ironically, static
> membership in this case makes the group more dynamic ;).
> 
> That

Re: [VOTE] KIP-354 Time-based log compaction policy

2018-12-04 Thread Colin McCabe

Hi Xiongqi,

Thinking about this a little bit more, it seems like we don't have any 
guarantees just by looking at the timestamp of the first message in a log 
segment.  Similarly, we don't have any guarantees just by looking at the 
maxTimestamp of the previous log segment.  Old data could appear anywhere-- you 
could put data that was years old in the middle of a segment from 2018.

However, if log.message.timestamp.difference.max.ms is set, then we can make 
some actual guarantees that old data gets purged-- which is what the GDPR 
requires, of course.

Overall, maybe we can make KIP-354 simpler by just always looking at the 
timestamp of the first log message.  I don't think looking at the maxTimestamp 
of the previous segment is any more accurate.  Aside from that, looks good, 
since we can get what we need with the combination of this and 
log.message.timestamp.difference.max.ms.

best,
Colin


On Mon, Nov 26, 2018, at 13:10, xiongqi wu wrote:
> Thanks for binding and non-binding votes.
> Can I get one more binding vote?
> 
> Thanks in advance!
> 
> Xiongqi (Wesley) Wu
> 
> 
> On Wed, Nov 14, 2018 at 7:29 PM Matt Farmer  wrote:
> 
> > I'm a +1 (non-binding) — This looks like it would have saved us a lot of
> > pain in an issue we had to debug recently. I can't go into details, but
> > figuring out how to achieve this effect gave me quite a headache. :)
> >
> > On Mon, Nov 12, 2018 at 1:00 PM xiongqi wu  wrote:
> >
> > > Hi all,
> > >
> > > Can I have one more vote on this KIP?
> > > Any comment is appreciated.
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Add+a+Maximum+Log+Compaction+Lag
> > >
> > >
> > > Xiongqi (Wesley) Wu
> > >
> > >
> > > On Fri, Nov 9, 2018 at 7:56 PM xiongqi wu  wrote:
> > >
> > > > Thanks Dong.
> > > > I have updated the KIP.
> > > >
> > > > Xiongqi (Wesley) Wu
> > > >
> > > >
> > > > On Fri, Nov 9, 2018 at 5:31 PM Dong Lin  wrote:
> > > >
> > > >> Thanks for the KIP Xiongqi. LGTM. +1 (binding)
> > > >>
> > > >> One minor comment: it may be a bit better to clarify in the public
> > > >> interface section that the value of the newly added metric is
> > determined
> > > >> based by applying that formula across all compactable segments. For
> > > >> example:
> > > >>
> > > >> The maximum value of Math.max(now -
> > > >> earliest_timestamp_in_ms_of_uncompacted_segment -
> > max.compaction.lag.ms
> > > ,
> > > >> 0)/1000 across all compactable partitions, where the
> > > >> max.compaction.lag.ms
> > > >> can be overridden on per-topic basis.
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Nov 9, 2018 at 5:16 PM xiongqi wu 
> > wrote:
> > > >>
> > > >> > Thanks Joel.
> > > >> > Tracking the delay at second granularity makes sense
> > > >> > I have updated KIP.
> > > >> >
> > > >> > Xiongqi (Wesley) Wu
> > > >> >
> > > >> >
> > > >> > On Fri, Nov 9, 2018 at 5:05 PM Joel Koshy 
> > > wrote:
> > > >> >
> > > >> > > +1 with one suggestion on the proposed metric. You should probably
> > > >> > include
> > > >> > > the unit. So for e.g., max-compaction-delay-secs.
> > > >> > >
> > > >> > > Joel
> > > >> > >
> > > >> > > On Tue, Nov 6, 2018 at 5:30 PM xiongqi wu 
> > > >> wrote:
> > > >> > >
> > > >> > > > bump
> > > >> > > > Xiongqi (Wesley) Wu
> > > >> > > >
> > > >> > > >
> > > >> > > > On Thu, Sep 27, 2018 at 4:20 PM xiongqi wu  > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > >
> > > >> > > > > Thanks Eno, Brett, Dong, Guozhang, Colin,  and Xiaohe for
> > > >> feedback.
> > > >> > > > > Can I have more feedback or VOTE on this KIP?
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Xiongqi (Wesley) Wu
> > > >> > > > >
> > > >> > >

Re: [VOTE] KIP-354 Time-based log compaction policy

2018-12-05 Thread Colin McCabe

Thanks, Xiongqi Wu.  +1 (binding)

regards,
Colin


On Tue, Dec 4, 2018, at 20:58, xiongqi (wesley) wu wrote:
> Colin,
> 
> Thanks for comments.
> Out of ordered message timestamp is a very good point.
> We can combine max.compaction.lag.ms  with
> log.message.timestamp.difference.max.ms to achieve what we want in an
> environment that message timestamp can be shifted a lot.
> 
> There are similar discussions regarding log.retention.ms and
> log.message.timestamp.difference.max.ms  in KAFKA-4340.
> 
> I agree that we can always use first message timestamp not the maxTimestamp
> of the previous log segment to make it simple.
> 
> I have updated the KIP.
> 
> Xiongqi (wesley) Wu
> 
> 
> On Tue, Dec 4, 2018 at 5:13 PM Colin McCabe  wrote:
> 
> > Hi Xiongqi,
> >
> > Thinking about this a little bit more, it seems like we don't have any
> > guarantees just by looking at the timestamp of the first message in a log
> > segment.  Similarly, we don't have any guarantees just by looking at the
> > maxTimestamp of the previous log segment.  Old data could appear anywhere--
> > you could put data that was years old in the middle of a segment from 2018.
> >
> > However, if log.message.timestamp.difference.max.ms is set, then we can
> > make some actual guarantees that old data gets purged-- which is what the
> > GDPR requires, of course.
> >
> > Overall, maybe we can make KIP-354 simpler by just always looking at the
> > timestamp of the first log message.  I don't think looking at the
> > maxTimestamp of the previous segment is any more accurate.  Aside from
> > that, looks good, since we can get what we need with the combination of
> > this and log.message.timestamp.difference.max.ms.
> >
> > best,
> > Colin
> >
> >
> > On Mon, Nov 26, 2018, at 13:10, xiongqi wu wrote:
> > > Thanks for binding and non-binding votes.
> > > Can I get one more binding vote?
> > >
> > > Thanks in advance!
> > >
> > > Xiongqi (Wesley) Wu
> > >
> > >
> > > On Wed, Nov 14, 2018 at 7:29 PM Matt Farmer  wrote:
> > >
> > > > I'm a +1 (non-binding) — This looks like it would have saved us a lot
> > of
> > > > pain in an issue we had to debug recently. I can't go into details, but
> > > > figuring out how to achieve this effect gave me quite a headache. :)
> > > >
> > > > On Mon, Nov 12, 2018 at 1:00 PM xiongqi wu 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Can I have one more vote on this KIP?
> > > > > Any comment is appreciated.
> > > > >
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Add+a+Maximum+Log+Compaction+Lag
> > > > >
> > > > >
> > > > > Xiongqi (Wesley) Wu
> > > > >
> > > > >
> > > > > On Fri, Nov 9, 2018 at 7:56 PM xiongqi wu 
> > wrote:
> > > > >
> > > > > > Thanks Dong.
> > > > > > I have updated the KIP.
> > > > > >
> > > > > > Xiongqi (Wesley) Wu
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 9, 2018 at 5:31 PM Dong Lin 
> > wrote:
> > > > > >
> > > > > >> Thanks for the KIP Xiongqi. LGTM. +1 (binding)
> > > > > >>
> > > > > >> One minor comment: it may be a bit better to clarify in the public
> > > > > >> interface section that the value of the newly added metric is
> > > > determined
> > > > > >> based by applying that formula across all compactable segments.
> > For
> > > > > >> example:
> > > > > >>
> > > > > >> The maximum value of Math.max(now -
> > > > > >> earliest_timestamp_in_ms_of_uncompacted_segment -
> > > > max.compaction.lag.ms
> > > > > ,
> > > > > >> 0)/1000 across all compactable partitions, where the
> > > > > >> max.compaction.lag.ms
> > > > > >> can be overridden on per-topic basis.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Nov 9, 2018 at 5:16 PM xiongqi wu 
> > > > wrote:
> > > > > >>
> > > > > >> > Thanks Joel.
> > > &

Re: Problem in CI for pull request

2018-12-21 Thread Colin McCabe

Try typing "retest this please" as a comment to the PR.

best,
Colin

On Wed, Nov 28, 2018, at 11:05, lk gen wrote:
> Hi,
> 
>   I made a pull request and it passed CI on JDK 11 but failed on JDK 8
> 
>   I think the JDK 8 error may not related to my commit but an environment
> problem on the CI
> 
>   How can I rerun the CI for my pull request ?
> 
>   The pull request is at
> https://github.com/apache/kafka/pull/5960
> 
> error states
> 
> *19:27:48* ERROR: H36 is offline; cannot locate JDK 1.8
> (latest)*19:27:48* ERROR: H36 is offline; cannot locate Gradle 4.8.1
> 
> 
> Thanks

Re: [DISCUSS] KIP-388 Add observer interface to record request and response

2018-12-21 Thread Colin McCabe

On Thu, Nov 29, 2018, at 01:15, Lincong Li wrote:
> Hi everyone,
> 
> Thanks for all feedback on this KIP. I have had some lengthy offline
> discussions with Dong, Joel and other insightful developers. I updated 
> KIP
> 388
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-388%3A+Add+observer+interface+to+record+request+and+response>
> and
> proposed a different way of recording each request and response. Here is 
> a
> summary of the change.
> 
> Instead of having interfaces as wrapper on AbstractRequest and
> AbstractResponse, I provided an interface on the Struct class which
> represents the Kafka protocol format ("wire format"). The interface is
> called ObservableStruct and it provides a set of getters that allow user to
> extract information from the internal Struct instance. Some other possible
> approaches are discussed in the KIP as well. But after lots of thinking, I
> think the currently proposed approach is the best one.
> 
> Why is this the best approach?
> 1. *It's the most general way* to intercept/observe each request/response
> with any type since each request/response must be materialized to a Struct
> instance at some point in their life cycle.
> 
> 2. *It's the easiest-to-maintain interface*. There is basically only one
> interface (ObservableStruct) and its implementation (ObservableStructImp)
> to maintain. Due to the fact that this interface essentially defines a set
> of ways to get field(s) from the Struct, that means even changes on the
> structure of the Structure (wire format changes) is not going to cause the
> interface to change.
> 
> 3. Struct represents the Kafka protocol format which is public. Expecting
> users to have knowledge of the format of the kind of request/response they
> are recording is reasonable. Moreover, *the proposed interfaces freeze the
> least amount of internal implementation details into public APIs by only
> exposing ways of extracting information on the Struct class*.
> 
> I am aware that adding this broker-side instrumentation would touch other
> sensitive modules and trigger many discussions on design trade-offs and
> etc. I appreciate all of your effort trying to make it happen and truly
> believe in the benefits it can bring to the community.
> 
> Thanks,
> Lincong Li

Hi Lincong,

Thanks for thinking through this problem.  It's a tough one!

In general, we would like to get rid of the Struct classes and just directly 
translate Java data structures to sequences of bytes (and vice versa).  The 
reason is that going from byte sequence to Struct to internal data structure is 
an extra step.  This extra step generates lots of garbage for the garbage 
collector to clean up.  This is especially true for large messages like 
FetchRequest.  In addition to the memory cost, this is extra code and extra 
operations that we don't really need to perform.

This issue has shown up a lot when we profile Kafka performance.  Since the 
amount of extra garbage scales with the number of partitions, it's especially 
bad in large enterprise clusters.

best,
Colin


> 
> 
> On Sun, Nov 18, 2018 at 9:35 PM Dong Lin  wrote:
> 
> > Hey Lincong,
> >
> > Thanks for the explanation. Here are my concern with the current proposal:
> >
> > 1) The current APIs of RequestInfo/ResponseInfo only provide byte and
> > count number of ProduceRequest/FetchRespnse. With these limited AIPs,
> > developers will likely have to create new KIP and make change in Apache
> > Kafka source code in order to implement more advanced observer plugin,
> > which would considerably reduces the extensibility and customizability of
> > observer plugins:
> >
> > Here are two use-cases that can be made possible if we can provide the raw
> > request/response to the observer interface:
> >
> > - Get the number of bytes produced per source host. This is doable if
> > plugin can get the original ProduceRequest, deserialize request into Kafka
> > messages, and parse messages based on the schema of the message payload.
> >
> > - Get the ApiVersion supported per producer/consumer IPs. In the future we
> > can add the version of client library in ApiVersionsRequest and observer
> > can monitor whether there is still client library that is using very old
> > version, and if so, what is their IP addresses.
> >
> > 2) It requires extra maintenance overhead for Apache Kafka developer to
> > maintain implementation of RequestInfo (e.g. bytes produced per topic),
> > which would not be necessary if we can just provide ProduceRequest to the
> > observer interface.
> >
> > 3) It is not clear why we need RequestInfo/ResponseInfo needs to be
> > int

Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-12-21 Thread Colin McCabe

Hi Sönke,

One path forward would be to forbid the new ACL types from being created until 
the inter-broker protocol had been upgraded.  We'd also have to figure out how 
the new ACLs were stored in ZooKeeper.  There are a bunch of proposals in this 
thread that could work for that-- I really hope we don't keep changing the ZK 
path each time there is a version bump.

best,
Colin


On Thu, Nov 29, 2018, at 14:25, Sönke Liebau wrote:
> This has been dormant for a while now, can I interest anybody in chiming in
> here?
> 
> I think we need to come up with an idea of how to handle changes to ACLs
> going forward, i.e. some sort of versioning scheme. Not necessarily what I
> proposed in my previous mail, but something.
> Currently this fairly simple change is stuck due to this being unsolved.
> 
> I am happy to move forward without addressing the larger issue (I think the
> issue raised by Colin is valid but could be mitigated in the release
> notes), but that would mean that the next KIP to touch ACLs would inherit
> the issue, which somehow doesn't seem right.
> 
> Looking forward to your input :)
> 
> Best regards,
> Sönke
> 
> On Tue, Jun 19, 2018 at 5:32 PM Sönke Liebau 
> wrote:
> 
> > Picking this back up, now that KIP-290 has been merged..
> >
> > As Colin mentioned in an earlier mail this change could create a
> > potential security issue if not all brokers are upgraded and a DENY
> > Acl based on an IP range is created, as old brokers won't match this
> > rule and still allow requests. As I stated earlier I am not sure
> > whether for this specific change this couldn't be handled via the
> > release notes (see also this comment [1] from Jun Rao on a similar
> > topic), but in principle I think some sort of versioning system around
> > ACLs would be useful. As seen in KIP-290 there were a few
> > complications around where to store ACLs. To avoid adding ever new
> > Zookeeper paths for future ACL changes a versioning system is probably
> > useful.
> >
> > @Andy: I've copied you directly in this mail, since you did a bulk of
> > the work around KIP-290 and mentioned potentially picking up the
> > follow up work, so I think your input would be very valuable here. Not
> > trying to shove extra work your way, I'm happy to contribute, but we'd
> > be touching a lot of the same areas I think.
> >
> > If we want to implement a versioning system for ACLs I see the
> > following todos (probably incomplete & missing something at the same
> > time):
> > 1. ensure that the current Authorizer doesn't pick up newer ACLs
> > 2. add a version marker to new ACLs
> > 3. change SimpleACLAuthorizer to know what version of ACLs it is
> > compatible with and only load ACLs of this / smaller version
> > 4. Decide how to handle if incompatible (newer version) ACLs are
> > present: log warning, fail broker startup, ...
> >
> >
> > Post-KIP-290 ACLs are stored in two places in Zookeeper:
> > /kafka-acl-extended   - for ACLs with wildcards in the resource
> > /kafka-acl   -  for literal ACLs without wildcards (i.e. * means * not
> > any character)
> >
> > To ensure 1 we probably need to move to a new directory once more,
> > call it /kafka-acl-extended-new for arguments sake. Any ACL stored
> > here would get a version number stored with it, and only
> > SimpleAuthorizers that actually know to look here would find these
> > ACLs and also know to check for a version number. I think Andy
> > mentioned moving the resource definition in the new ACL format to JSON
> > instead of simple string in a follow up PR, maybe these pieces of work
> > are best tackled together - and if a new znode can be avoided even
> > better.
> >
> > This would allow us to recognize situations where ACLs are defined
> > that not all Authorizers can understand, as those Authorizers would
> > notice that there are ACLs with a larger version than the one they
> > support (not applicable to legacy ACLs up until now). How we want to
> > treat this scenario is up for discussion, I think make it
> > configurable, as customers have different requirements around
> > security. Some would probably want to fail a broker that encounters
> > unknown ACLs so as to not create potential security risks t others
> > might be happy with just a warning in the logs. This should never
> > happen, if users fully upgrade their clusters before creating new ACLs
> > - but to counteract the situation that Colin described it would be
> > useful.
> >
> > Looking forward, a migration option might be added to the kaf

Re: [EXTERNAL] - Re: [DISCUSS] KIP-387: Fair Message Consumption Across Partitions in KafkaConsumer

2018-12-22 Thread Colin McCabe

Hi ChienHsing Wu,

Maybe I'm misunderstanding something, but I'm not sure I see the need for a KIP 
here.  You can just set max.partition.fetch.bytes to a very small value.  That 
will cause Kafka to fetch only one message from each partition.  This will give 
you the round robin behavior you want.

Alternately, if you don't want to change max.partition.fetch.bytes, you could 
do your own buffering to get round robin behavior.  Keep a buffer of messages 
from partition A, B, C, and D and hold back the messages from A, B, and C until 
one from D arrives, so that the A B C D A B C D... etc. order always repeats.

best,
Colin


On Wed, Dec 19, 2018, at 09:00, ChienHsing Wu wrote:
> Looking back the email thread I think one of the comments from Mayuresh 
> was the question about needing KIP for this change or not as the 
> KafkaConsumer does not guarantee the end user any order, and so no 
> changes to the contracts to users.
> 
> I entered KIP based on suggestions from the attached email when going 
> through code contribution process. I am not sure what to do next in this 
> KIP process. Could anyone please help/advise me on what to do next?
> 
> Thanks!
> 
> CH
> 
> -Original Message-
> From: ChienHsing Wu  
> Sent: Wednesday, December 12, 2018 1:05 PM
> To: dev@kafka.apache.org
> Subject: RE: [EXTERNAL] - Re: [DISCUSS] KIP-387: Fair Message 
> Consumption Across Partitions in KafkaConsumer
> 
> Good to know that, Thanks! 
> 
> Nonetheless, that introduces additional complexity at the client side 
> for a common expectation to more or less receives records in a fair 
> fashion.
> 
> CH
> 
> -Original Message-
> From: Mayuresh Gharat 
> Sent: Wednesday, December 12, 2018 12:55 PM
> To: dev@kafka.apache.org
> Subject: Re: [EXTERNAL] - Re: [DISCUSS] KIP-387: Fair Message 
> Consumption Across Partitions in KafkaConsumer
> 
> Hi ChienHsing,
> 
> We are actually working on buffering the already fetched data for paused 
> topicPartitions, so ideally it should not have any effect on 
> performance.
> Associated jira : 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_KAFKA-2D7548&d=DwIFaQ&c=ZgVRmm3mf2P1-XDAyDsu4A&r=Az03wMrbL9ToLW0OFyo3wo3985rhAKPMLmmg6RA3V7I&m=7eC1W-f8nKkXMGJti3n0zF4qDV0af8y5uOWVIftTJ-U&s=_ERDVQqqt9Grnxt7DDO_gC9CvpD_ylhH8ZoHLwSXEpU&e=
> 
> Thanks,
> 
> Mayuresh
> 
> On Wed, Dec 12, 2018 at 6:01 AM ChienHsing Wu  wrote:
> 
> > Hi Mayuresh,
> >
> > Thanks for the input!
> >
> > Pausing and Resuming are cumbersome and has some undesirable 
> > performance impact since pausing will in effect clean up the completed 
> > fetch and resuming will call the broker to retrieve again.
> >
> > The way I changed the code was just to parse the completed fetch 
> > earlier and ensure the order to retrieve are the same as the completed 
> > fetch queue.
> > I did make code changes to take into account the following in Fetcher class.
> >
> > 1) exception handling
> > 2) ensure the parsed partitions are not included in 
> > fetchablePartitions
> > 3) clear buffer when not in the newly assigned partitions in 
> > clearBufferedDataForUnassignedPartitions
> > 4) close them properly in close method
> >
> > Though the consumer does not guarantee explicit order, KIP 41 (link
> > below) did intend to ensure fair distribution and therefore the round 
> > robin algorithm in the code. The change I propose was to enhance it.
> >
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> > confluence_display_KAFKA_KIP-2D41-253A-2BKafkaConsumer-2BMax-2BRecords
> > -23KIP-2D41-3AKafkaConsumerMaxRecords-2DEnsuringFairConsumption&d=DwIF
> > aQ&c=ZgVRmm3mf2P1-XDAyDsu4A&r=Az03wMrbL9ToLW0OFyo3wo3985rhAKPMLmmg6RA3
> > V7I&m=7eC1W-f8nKkXMGJti3n0zF4qDV0af8y5uOWVIftTJ-U&s=NKZHA5HVggfKWlF_yg
> > 6V3-Wyf_Z6x7n1HQPQ1_M0d9A&e=
> >
> > As for performance, the changes does not add any additional calls to 
> > the broker nor does it introduce significant processing logic; it just 
> > parses the completed fetch earlier and have a list to manage them.
> >
> >
> > CH
> >
> > -Original Message-
> > From: Mayuresh Gharat 
> > Sent: Tuesday, December 11, 2018 6:58 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [EXTERNAL] - Re: [DISCUSS] KIP-387: Fair Message 
> > Consumption Across Partitions in KafkaConsumer
> >
> > Hi ChienHsing,
> >
> > The other way I was thinking, this can be done outside of 
> > KafkaConsumer is by pausing and resuming TopicPartitions (m

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-10 Thread Colin McCabe

Hi all,

Just as a quick reminder, this is not really a complete proposal.  There are a 
bunch of unresolved issues with this KIP.  One example is how this interacts 
with incremental fetch sessions.  It is not mentioned anywhere in the KIP text. 
 Previously we discussed some approaches, but there was no clear consensus.

Another example is the issue of starvation.  The KIP discusses "an idea" for 
handling starvation, but the details are very sparse-- just a sentence of two.  
At minimum we would need some kind of configuration for the proposed "lag 
deltas".  It's also not clear that the proposed mechanism would work, since we 
don't receive lag metrics for partitions that we don't fetch.  But if we do 
fetch from the partitions, we may receive data, which would cause our policy to 
not be strict prioties.  Keep in mind, even attempting to fetch 1 byte may 
cause us to read an entire message, as described in KIP-74.

It seems that we don't understand the potential use-cases.  The only use-case 
referenced by the KIP is this one, by Bala Prassanna:

 > We use Kafka to process the asynchronous events of our Document Management 
 > System such as preview generation, indexing for search etc.
 > The traffic gets generated via Web and Desktop Sync application. In such 
 > cases, we had to prioritize the traffic from web and consume them first.  
 > But this might lead to the starvation of events from sync if the consumer 
 > speed is slow and the event rate is high from web.  A solution to handle 
 > the starvation with a timeout after which the events are consumed normally 
 > for a specified period of time would be great and help us use our 
 > resources effectively.

Reading this carefully, it seems that the problem is actually starvation, not 
implementing priorities.  Bala already implemented priorities outside of Kafka. 
 If you read the discussion on KAFKA-6690, Bala also makes this comment: "We 
would need this in both Consumer API and Streams API."  The current KIP does 
not discuss adding priorities to Streams-- only to the basic consumer API.  So 
it seems clear that KIP-349 does not address Bala's use-case at all.

Stepping back a little bit, it seems like a few people have spoken up recently 
asking for some way to re-order the messages they receive from the Kafka 
consumer.  For example, ChienHsing Wu has discussed a use-case where he wants 
to receive messages in a "round robin" order.  All of this is possible by doing 
some local buffering and using the pause and resume APIs.  Perhaps we should 
consider better documenting these APIs, and adding some examples.  Or perhaps 
we should consider some kind of API to do pluggable buffering on the client 
side.

In any case, this needs more discussion.  We need to be clear and definite 
about what use cases we want to solve, and the tradeoffs we're making to solve 
them.  For now, I have to reiterate my -1 (binding).

Colin

On Thu, Jan 10, 2019, at 10:46, Adam Bellemare wrote:
> Looks good to me then!
> 
> +1 non-binding
> 
> 
> 
> > On Jan 10, 2019, at 1:22 PM, Afshartous, Nick  
> > wrote:
> > 
> > 
> > Hi Adam,
> > 
> > 
> > This change is only intended for the basic consumer API.
> > 
> > 
> > Cheers,
> > 
> > --
> > 
> >Nick
> > 
> > 
> > 
> > From: Adam Bellemare 
> > Sent: Sunday, January 6, 2019 11:45 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-349 Priorities for Source Topics
> > 
> > Hi Nick
> > 
> > Is this change only for the basic consumer? How would this affect anything 
> > with Kafka Streams?
> > 
> > Thanks
> > 
> > 
> >> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
> >> 
> >> Bumping again for more votes.
> >> --
> >> Nick
> >> 
> >> 
> >>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
> >>> 
> >>> Bumping this thread for more votes
> >>> 
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics&d=DwIFAg&c=-SicqtCl7ffNuxX6bdsSog&r=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs&m=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8&s=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0&e=
> >>>  
> >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics&d=DwIFAg&c=-SicqtCl7ffNuxX6bdsSog&r=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs&m=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8&s=Sbrd5XSwEZiMc9iTPJjRQaf

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-15 Thread Colin McCabe

On Sun, Jan 13, 2019, at 18:13, n...@afshartous.com wrote:
> Thanks Colin and Mathias.
> 
> > On Jan 12, 2019, at 8:27 PM, Matthias J. Sax  wrote:
> > 
> > Thus, I would suggest to limit this KIP to the consumer only, otherwise,
> > the scope will be too large and this KIP will drag on even longer. If we
> > really want to add this to Kafka Streams, I expect a long and difficult
> > discussion about this by itself, and thus, doing this in a follow up KIP
> > (if there is any demand) seems to be the better approach.
> > 
> 
> Agreed, and my intent is to limit the scope to the consumer.  
> 
> > About the starvation issue: maybe it's a bold claim, but is a potential
> > starvation of a low-priority topic not intended by design if topics have

Yeah, I was thinking this as well.  If you want strict priority behavior, high 
priority tasks should always take priority over low priority ones.

> 
> On reflection, it would be hard to describe the semantics of an API 
> that tried to address starvation by temporarily disabling 
> prioritization, and then oscillating back and forth. 
> Thus I agree that it makes sense not to try and address starvation to 
> Mathias’ point that this is intended by design.  The KIP has been 
> updated to reflect this by removing the second method.  

Yeah, I agree with that.  The problem is, the actual policy you want is kind of 
complex.  It's probably better in the application rather than in Kafka.

> 
> Regarding incremental fetch, Colin do you have any suggestion on which 
> option to adopt or how to proceed ?  

I think it makes sense to go back to use-cases again.  So far, all of the 
use-cases we discussed could be handled by pause and resume.  So it makes sense 
to try to figure out what the issue with those APIs is.  Are they not 
well-documented enough?  Is there something higher-level we could build on top 
to make them easier to use?

It would be better to wait until a user comes forward and with a case where 
priorities are needed, to implement them.  Since then we would know more about 
what the API should be, etc.

best,
Colin

Re: [DISCUSS] KIP-402: Improve fairness in SocketServer processors

2019-01-15 Thread Colin McCabe

Hi Rajini,

Thanks for this.  The KIP looks really useful.

>
 > A new metric will be added to track the amount of time Acceptor is blocked 
 > from accepting connections due to backpressure. This will be a yammer 
 > Meter, consistent with other SocketServer metrics.
>
> kafka.network:type=Acceptor,name=AcceptorIdlePercent,listener={listenerName}
>

Hmm.  I was a bit confused by this.  When the acceptor is not accepting 
connections because there are none coming in, does that count as idle?  When 
the acceptor is not accepting connections because the connect rate is being 
backpressured, does that count as idle?  Would it would be more intuitive to 
have a metric titled AcceptorBackpressuredPercent?

Also, I sort of wonder if titling this "Limit incoming connection connection 
rate" or similar would be clearer than "improving fairness."  I guess it is 
unfair that a lot of incoming connections can swamp the network threads right 
now.  But limiting the rate of new connections is unfair to people connecting.  
Overall the goal seems to be usability, not fairness.

best,
Colin



On Tue, Jan 15, 2019, at 04:27, Rajini Sivaram wrote:
> Hi Jan,
> 
> If the queue of one Processor is full, we move to the next Processor
> immediately without blocking. So as long as the queue of any Processor is
> not full, we accept the connection immediately. If the queue of all
> Processors are full, we assign a Processor and block until the connection
> can be added. There is currently no timeout for this. The PR is here:
> https://github.com/apache/kafka/pull/6022
> 
> Thanks,
> 
> Rajini
> 
> On Tue, Jan 15, 2019 at 12:02 PM Jan Filipiak 
> wrote:
> 
> >
> >
> > > The connection queue for Processors will be changed to
> > ArrayBlockingQueue with a fixed size of 20. Acceptor will use round-robin
> > allocation to allocate each new connection to the next available Processor
> > to which the connection can be added without blocking. If a Processor's
> > queue is full, the next Processor will be chosen. If the connection queue
> > on all Processors are full, Acceptor blocks until the connection can be
> > added to the selected Processor. No new connections will be accepted during
> > this period. The amount of time Acceptor is blocked can be monitored using
> > the new AcceptorIdlePercent metric.
> >
> > So if the queue of one Processor is full, what is the strategy to move
> > to the next queue? Are we using offer with a timeout here? How else can
> > we make sure that a single slow processor will not block the entire
> > processing? I assume we do not allow us to get stuck during put as you
> > mention that all queues full is a scenario. I think there is quite some
> > uncertainty here. Is there any code one could check out?
> >
> > Best Jan
> >
>

[DISCUSS] 2.1.1 bug-fix release

2019-01-15 Thread Colin McCabe

Hi all,

I'd like to volunteer to be the release manager for the 2.1.1 bug fix release.

2.1 was released November 20, 2018.  There are 34 fixes scheduled for inclusion 
in 2.1.1 so far.

Please find all the resolved tickets here:
https://issues.apache.org/jira/browse/KAFKA-7818?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%202.1.1%20%20

Please find the Release plan:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.1.1

regards,
Colin

Re: [DISCUSS] KIP-402: Improve fairness in SocketServer processors

2019-01-15 Thread Colin McCabe

On Tue, Jan 15, 2019, at 12:59, Rajini Sivaram wrote:
> Hi Colin,
> 
> `AcceptorIdlePercent` indicates the total amount of time the acceptor is
> inactive and not accepting any connections because it is blocked on
> Processors. But I agree the name could be improved. There is back pressure
> at the Java level (which we can't monitor) and back pressure we apply with
> blocking queues which the metric refers to. Perhaps
> `AcceptorBlockedPercent`reflects
> that it is the time that the acceptor is blocked?

Yeah, I like "AcceptorBlockedPercent" better.

> 
> As Ismael said, fairness refers to the distribution of time between
> processing of new connections and processing of existing connections.

OK.

best,
Colin

> 
> Thanks,
> 
> Rajini
> 
> 
> On Tue, Jan 15, 2019 at 7:56 PM Ismael Juma  wrote:
> 
> > I think the point is that we distribute the time more fairly between
> > connection handling and other operations where before we could block on the
> > TLS handshake for a long time given a large number of connections.
> >
> > Ismael
> >
> > On Tue, Jan 15, 2019 at 11:39 AM Colin McCabe  wrote:
> >
> > > Hi Rajini,
> > >
> > > Thanks for this.  The KIP looks really useful.
> > >
> > > >
> > >  > A new metric will be added to track the amount of time Acceptor is
> > > blocked
> > >  > from accepting connections due to backpressure. This will be a yammer
> > >  > Meter, consistent with other SocketServer metrics.
> > > >
> > > >
> > >
> > kafka.network:type=Acceptor,name=AcceptorIdlePercent,listener={listenerName}
> > > >
> > >
> > > Hmm.  I was a bit confused by this.  When the acceptor is not accepting
> > > connections because there are none coming in, does that count as idle?
> > > When the acceptor is not accepting connections because the connect rate
> > is
> > > being backpressured, does that count as idle?  Would it would be more
> > > intuitive to have a metric titled AcceptorBackpressuredPercent?
> > >
> > > Also, I sort of wonder if titling this "Limit incoming connection
> > > connection rate" or similar would be clearer than "improving fairness."
> > I
> > > guess it is unfair that a lot of incoming connections can swamp the
> > network
> > > threads right now.  But limiting the rate of new connections is unfair to
> > > people connecting.  Overall the goal seems to be usability, not fairness.
> > >
> > > best,
> > > Colin
> > >
> > >
> > >
> > > On Tue, Jan 15, 2019, at 04:27, Rajini Sivaram wrote:
> > > > Hi Jan,
> > > >
> > > > If the queue of one Processor is full, we move to the next Processor
> > > > immediately without blocking. So as long as the queue of any Processor
> > is
> > > > not full, we accept the connection immediately. If the queue of all
> > > > Processors are full, we assign a Processor and block until the
> > connection
> > > > can be added. There is currently no timeout for this. The PR is here:
> > > > https://github.com/apache/kafka/pull/6022
> > > >
> > > > Thanks,
> > > >
> > > > Rajini
> > > >
> > > > On Tue, Jan 15, 2019 at 12:02 PM Jan Filipiak <
> > jan.filip...@trivago.com>
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > The connection queue for Processors will be changed to
> > > > > ArrayBlockingQueue with a fixed size of 20. Acceptor will use
> > > round-robin
> > > > > allocation to allocate each new connection to the next available
> > > Processor
> > > > > to which the connection can be added without blocking. If a
> > Processor's
> > > > > queue is full, the next Processor will be chosen. If the connection
> > > queue
> > > > > on all Processors are full, Acceptor blocks until the connection can
> > be
> > > > > added to the selected Processor. No new connections will be accepted
> > > during
> > > > > this period. The amount of time Acceptor is blocked can be monitored
> > > using
> > > > > the new AcceptorIdlePercent metric.
> > > > >
> > > > > So if the queue of one Processor is full, what is the strategy to
> > move
> > > > > to the next queue? Are we using offer with a timeout here? How else
> > can
> > > > > we make sure that a single slow processor will not block the entire
> > > > > processing? I assume we do not allow us to get stuck during put as
> > you
> > > > > mention that all queues full is a scenario. I think there is quite
> > some
> > > > > uncertainty here. Is there any code one could check out?
> > > > >
> > > > > Best Jan
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-402: Improve fairness in SocketServer processors

2019-01-16 Thread Colin McCabe

+1 (binding)

Thanks,
Colin


On Tue, Jan 15, 2019, at 18:55, Gwen Shapira wrote:
> +1
> Thank you!
>
>
> On Tue, Jan 15, 2019, 3:38 PM Rajini Sivaram
> 
> > Hi all,
> >
> > I would like to start vote on KIP-402 to improve fairness in channel> > 
> > processing in SocketServer to protect brokers from connection
> > storms and> > limit the total number of connections in brokers to avoid 
> > OOM. The
> > KIP is> > here:
> >
> >-
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors>
> >  >
> >
> > Thanks,
> >
> > Rajini
> >

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-07 Thread Colin McCabe

On Fri, May 4, 2018, at 05:49, Viktor Somogyi wrote:
> Hi Colin,
> 
> > Rather than breaking compatibility, we should simply add a new 
> > "incremental" boolean to AlterConfigsOptions.  Callers can set this boolean 
> > to true when they want the update to be incremental.  It should default to 
> > false so that old code continues to work.
> 
> Agreed, let's do it this way.
> 
> > Hmm.  I don't think AlterOperation is necessary.  If the user wants to 
> > delete a configuration key named "foo", they can create a ConfigEntry with 
> > name = "foo", value = null.
> 
> AlterConfig's config type currently is string, so the only possibility
> is to use an empty string as changing the type to nullable_string
> could be breaking if the client code doesn't expect -1 as the string
> size. In the discussion thread earlier we had a conversation about
> this with Rajini, let me paste it here (so it gives some context). At
> that point I had the text "" for this functionality:

Hi Viktor,

We are going to need to create a new version of AlterConfigsRequest to add the 
"incremental" boolean.  So while we're doing that, maybe we can change the type 
to NULLABLE_STRING.

> "4. We use "" internally to store default quotas and other
> defaults. But I don't think we should externalise that string. We use empty
> string elsewhere for indicating default, we can do the same here.

Hmm.  Not sure I follow.  KIP-133 doesn't use the empty string or "" 
to indicate defaults, does it?

There is a ConfigEntry class:

 > @InterfaceStability.Evolving
 > public class ConfigEntry {
 > 
 > private final String name;
 > private final String value;
 > private final ConfigSource source;
 > private final boolean isSensitive;
 > private final boolean isReadOnly;
 > private final List synonyms;

and the ConfigSource enum indicates where the config came from:

 > /**
 >  * Source of configuration entries.
 >  */
 > public enum ConfigSource {
 > DYNAMIC_TOPIC_CONFIG,   // dynamic topic config that is 
 > configured for a specific topic
 > DYNAMIC_BROKER_CONFIG,  // dynamic broker config that is 
 > configured for a specific broker
 > DYNAMIC_DEFAULT_BROKER_CONFIG,  // dynamic broker config that is 
 > configured as default for all brokers in the cluster
 > STATIC_BROKER_CONFIG,   // static broker config provided as 
 > broker properties at start up (e.g. server.properties file)
 > DEFAULT_CONFIG, // built-in default configuration 
 > for configs that have a default value
 > UNKNOWN // source unknown e.g. in the 
 > ConfigEntry used for alter requests where source is not set
 > }

This comes from DescribeConfigsResponse.
Unless I'm missing something, I think your suggestion to not expose "" 
is already implemented?

> And we use STRING rather than NULLABLE_STRING in describe configs etc. So we
> should probably do the same for quotas."

I think nearly all responses treat ERROR_MESSAGE as a nullable string.  
CommonFields#ERROR_MESSAGE, which is used by most of them, is a nullable 
string.  It's DescribeConfigsResponse that is the black sheep here.

 > public static final Field.NullableStr ERROR_MESSAGE = new 
 > Field.NullableStr("error_message", "Response error message");

> 
> > Yeah, this might be an excessive maintenance burden.  Maybe we should get 
> > rid of the old zookeeper-based code, and just move towards having only a 
> > KIP-248-based tool.  It's a breaking change, but it's clear to users that 
> > it's occurring, and what the fix is (specifying --bootstrap-server instead 
> > of --zookeeper).
> 
> Earlier Rajini raised a concern that direct zookeeper interaction is
> required to add the SCRAM credentials which will be used for
> validation if inter-broker communication uses this auth method. This
> is currently done by the ConfigCommand. Therefore we can't completely
> get rid of it yet either.
>
> In my opinion though on a longer term (and this is now a bit
> off-topic) Kafka shouldn't use Zookeeper as a credentials store, just
> provide an interface, so 3rd party authentication stores could be
> implemented. Then similarly to the authorizer we could have Zookeeper
> as a default though and a client that manages SCRAM credentials in ZK.
> From this perspective I'd leave the the command there but put a
> warning that the tool is deprecated and should only be used for
> setting up SCRAM credentials.
> What do you think?

What about

Re: [DISCUSS] KIP-273 Kafka to support using ETCD beside Zookeeper

2018-05-09 Thread Colin McCabe

Hi Molnar,

The points Ismael brought up earlier (and that were brought up on KIP-30) are 
still relevant here.  As Ismael said, the goal is to get rid of external 
dependencies here.   We're going to post more about this soon (sorry for the 
delay)

thanks,
Colin


On Wed, May 9, 2018, at 07:29, Molnár Bálint wrote:
> Hi,
> I just rebased the Etcd implementation proposal on trunk. Pinging to see if
> anyone has feedback on my questions from my previous email.
> 
> Molnár Bálint  ezt írta (időpont: 2018. ápr. 4.,
> Sze, 10:08):
> 
> > Hi,
> > Thanks again for the feedback.
> >
> > Is there already ongoing work for having an own consensus implementation
> > within Kafka?
> > If that work haven't started yet, we think there is value in having an
> > interim solution, that allows the use of another consensus system besides
> > Zookeeper.
> >
> > We ask the community to take a look at the Etcd implementation proposal
> > we created and provide feedback on that.
> > This helps to asses rather this approach is viable at all.
> >
> > We are open to collaborate on integrating our proposed Etcd implementation
> > into any integration test system, to certify that all use cases works as
> > expected.
> >
> > Balint
> >
> > 2018-03-30 22:21 GMT+02:00 Gwen Shapira :
> >
> >> Hi,
> >>
> >> I had an offline discussion with Ismael and wanted to summarize the
> >> comments and questions he raised so we are all on the same page.
> >>
> >> The core issue is that this change adds a new public API. Since we already
> >> know that the goal for the next 1-2 years is to get rid of ZK completely.
> >> Do we want to go to the effort of adding (and discussing and reviewing) a
> >> new public API, knowing that it will be completely removed in a year? And
> >> since building and testing a plugin also involves effort, will anyone do
> >> it
> >> for something that is going to be temporary by design?
> >>
> >> Ismael, correct me if this isn't a fair representation of your concerns.
> >>
> >> Gwen
> >>
> >>
> >>
> >> On Thu, Mar 29, 2018 at 9:33 AM, Gwen Shapira  wrote:
> >>
> >> > Few other concerns that were raised in the previous discussion were
> >> around
> >> > the challenges both to maintainers and users in making this API
> >> pluggable
> >> > and how does making the interface pluggable aligns with future goals for
> >> > the project. At the time this was difficult to discuss because there
> >> wasn't
> >> > a concrete proposal. I want to discuss these points in the context of
> >> this
> >> > specific proposal:
> >> >
> >> > 1. Problem: Pluggable APIs mean larger surface testing area and multiple
> >> > implementations to cover.
> >> > In this case: At the time, the Kafka project didn't have much
> >> > experience with pluggable APIs and components, so the concerns were very
> >> > valid. Right now Kafka has multiple pluggable components - Connectors,
> >> > converters, transformations, authentication protocols, authorization
> >> > database, coordination protocol, serializers, etc. I think that as a
> >> > community we gotten better at testing the interface, testing the very
> >> few
> >> > implementations that are included in Apache Kafka itself and allowing
> >> the
> >> > community to innovate and validate outside of the Kafka project. I don't
> >> > recall major issues either from lack of testing or from usability
> >> > perspective.
> >> >
> >> > 2. Problem: Users don't want to choose a consensus implementation, they
> >> > just don't want ZK.
> >> > In this case: I agree that users don't actually want to spend time
> >> > choosing consensus implementation and a simpler deployment model would
> >> > serve them better. IMO, if Apache Kafka ships with our well-tested ZK
> >> > implementation, 99% of the users will choose to use that (a vast
> >> majority
> >> > uses our less-than-amazing authorization plugin), and the few that
> >> really
> >> > need something else for whatever reason, will be able to get what they
> >> > need. As Jake said, we need to face the fact that development
> >> trajectory of
> >> > ZK isn't amazing at the moment, that it is lacking features our users
> >> need

Re: [DISCUSS] KIP-278: Add version option to Kafka's commands

2018-05-09 Thread Colin McCabe

+1. Thanks, Sasaki.

Colin

On Wed, May 9, 2018, at 09:15, Jason Gustafson wrote:
> Hi Sasaki,
> 
> Thanks for the update. The KIP looks good to me. I'd suggest moving to a
> vote.
> 
> Thanks,
> Jason
> 
> On Mon, May 7, 2018 at 7:08 AM, Sasaki Toru 
> wrote:
> 
> > Hi Manikumar, Colin,
> >
> > Thank you for your comment.
> >
> > As Colin said, I proposed adding an option to show version information of
> > "local" tool,
> > because many software have this option and I think Kafka also needs this
> > one.
> >
> > As you said, the function to show remote Kafka version is useful,
> > but I think it is better to create new KIP because this function has some
> > points which should be considered.
> >
> > If you have any better ideas, could you please tell us?
> >
> >
> > Many thanks,
> > Sasaki
> >
> > From: Manikumar 
> >> Date: 2018-05-03 4:11 GMT+09:00
> >>
> >> Subject: Re: [DISCUSS] KIP-278: Add version option to Kafka's commands
> >> To: dev 
> >>
> >>
> >> Hi Colin,
> >>
> >> Thanks for explanation. It's definitely useful to have  --version flag.
> >>
> >> kafka-broker-api-versions.sh gives the API versions, not Kafka release
> >> version.
> >> Is not easy to figure out release version from API versions. Currently
> >> release version is available via metric/JMX.
> >> If required, we can implement this in future.
> >>
> >>
> >> Thanks,
> >>
> >> On Wed, May 2, 2018 at 10:58 PM, Colin McCabe  wrote:
> >>
> >> Hi Manikumar,
> >>>
> >>> We already have a tool for getting the Kafka broker API versions,
> >>> "./bin/kafka-broker-api-versions.sh".  It was added as part of KIP-97.
> >>>
> >>> What Saski is proposing here is having a way of getting the version of
> >>> locally installed Kafka software, which may be different from the server
> >>> version.  Many pieces of software offer a --version flag, and it's always
> >>> understood to refer to the local version of the software, not a version
> >>> running somewhere else.  The user has no way to get this information now,
> >>> other than perhaps trying to look at he names of jar files.
> >>>
> >>> cheers,
> >>> Colin
> >>>
> >>> On Tue, May 1, 2018, at 08:20, Manikumar wrote:
> >>>
> >>>> I assume the intent of the KIP to find out the Kafka broker version.  In
> >>>> this case, maybe we should expose
> >>>> version using a Kafka request. This will help the remote scripts/tools
> >>>>
> >>> to
> >>
> >>> query the Kafka version.
> >>>> scripts (kafka-topics.sh, kafka-configs.sh, etc..)  may run from remote
> >>>> machines  and may use
> >>>> older Kafka versions. In this case, current proposal prints on the older
> >>>> version.
> >>>>
> >>>> On Tue, May 1, 2018 at 7:47 PM, Colin McCabe 
> >>>> wrote:
> >>>>
> >>>> Thanks, Sasaki.
> >>>>>
> >>>>> Colin
> >>>>>
> >>>>> On Sat, Apr 28, 2018, at 00:55, Sasaki Toru wrote:
> >>>>>
> >>>>>> Hi Colin, Jason,
> >>>>>>
> >>>>>> Thank you for your beneficial comment.
> >>>>>> I have updated my Pull Request to show git commit hash in version
> >>>>>> information.> In my current Pull Request, we cat get the result such
> >>>>>>
> >>>>> below:
> >>>>>
> >>>>>> $ bin/kafka-topics.sh --version
> >>>>>> (snip)
> >>>>>> 2.0.0-SNAPSHOT (Commit:f3876cd9617faf7e)
> >>>>>>
> >>>>>>
> >>>>>> I have also updated to accept double-dash for this option (--
> >>>>>> version) only.>
> >>>>>>
> >>>>>> Many thanks,
> >>>>>> Sasaki
> >>>>>>
> >>>>>> From: Jason Gustafson 
> >>>>>>> Date: 2018-04-25 9:42 GMT+09:00
> >>>>>>> Subject: Re: [DISCUSS] KIP-278: Add version option to Kafka's
> >>>>>>> commands> > To: dev 
> >>>>>>>

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-10 Thread Colin McCabe

Hi Andy,

The issue that I was trying to solve here is the Java API.  Right now, someone 
can write "new ResourceFilter(ResourceType.TRANSACTIONAL_ID, "foo*") and have a 
ResourceFilter that applies to a Transactional ID named "foo*".  This has to 
continue to work, or else we have broken compatibility.

I was proposing that there would be something like a new function like 
ResourceFilter.fromPattern(ResourceType.TRANSACTIONAL_ID, "foo*") which would 
create a ResourceFilter that applied to transactional IDs starting with "foo", 
rather than transactional IDs named "foo*" specifically.

I don't think it's important whether the Java class has an integer, an enum, or 
two string fields.  The important thing is that there's a new static function, 
or new constructor overload, etc. that works for patterns rather than literal 
strings.

On Thu, May 10, 2018, at 03:30, Andy Coates wrote:
> Rather than having name and pattern fields on the ResourceFilter, where 
> it’s only valid for one to be set, and we want to restrict the character 
> set in case future enhancements need them, we could instead add a new 
> integer ‘nameType’ field, and use constants to indicate how the name 
> field should be interpreted, e.g. 0 = literal, 1 = wildcard. This would 
> be extendable, e.g we can later add 2 = regex, or what ever, and 
> wouldn’t require any escaping.

This is very user-unfriendly, though.  Users don't want to have to explicitly 
supply a version number when using the API, which is what this would force them 
to do.  I don't think users are going to want to memorize that version 4 
supprted "+", whereas version 3 only supported "[0-9]", or whatever.   

Just as an example, do you remember which versions of FetchRequest added which 
features?  I don't.  I always have to look at the code to remember.

Also, escaping is still required any time you overload a character to mean two 
things.  Escaping is required in the current proposal to be able to create a 
pattern that matches only "foo*".  You have to type "foo\*"  It would be 
required if we forced users to specify a version, as well.

best,
Colin

> 
> Sent from my iPhone
> 
> > On 7 May 2018, at 05:16, Piyush Vijay  wrote:
> > 
> > Makes sense. I'll update the KIP.
> > 
> > Does anyone have any other comments? :)
> > 
> > Thanks
> > 
> > 
> > Piyush Vijay
> > 
> >> On Thu, May 3, 2018 at 11:55 AM, Colin McCabe  wrote:
> >> 
> >> Yeah, I guess that's a good point.  It probably makes sense to support the
> >> prefix scheme for consumer groups and transactional IDs as well as topics.
> >> 
> >> I agree that the current situation where anything goes in consumer group
> >> names and transactional ID names is not ideal.  I wish we could rewind the
> >> clock and impose restrictions on the names.  However, it doesn't seem
> >> practical at the moment.  Adding new restrictions would break a lot of
> >> existing users after an upgrade.  It would be a really bad upgrade
> >> experience.
> >> 
> >> However, I think we can support this in a compatible way.  From the
> >> perspective of AdminClient, we just have to add a new field to
> >> ResourceFilter.  Currently, it has two fields, resourceType and name:
> >> 
> >>> /**
> >>> * A filter which matches Resource objects.
> >>> *
> >>> * The API for this class is still evolving and we may break
> >> compatibility in minor releases, if necessary.
> >>> */
> >>> @InterfaceStability.Evolving
> >>> public class ResourceFilter {
> >>>private final ResourceType resourceType;
> >>>private final String name;
> >> 
> >> We can add a third field, pattern.
> >> 
> >> So the API will basically be, if I create a 
> >> ResourceFilter(resourceType=GROUP,
> >> name=foo*, pattern=null), it applies only to the consumer group named
> >> "foo*".  If I create a ResourceFilter(resourceType=GROUP, name=null,
> >> pattern=foo*), it applies to any consumer group starting in "foo".  name
> >> and pattern cannot be both set at the same time.  This preserves
> >> compatibility at the AdminClient level.
> >> 
> >> It's possible that we will want to add more types of pattern in the
> >> future.  So we should reserve "special characters" such as +, /, &, %, #,
> >> $, etc.  These characters should be treated as special unless they are
> >> prefixed with a backslash to escape them.

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-11 Thread Colin McCabe

Hi Andy,

I see what you mean.  I guess my thought here is that if the fields are 
private, we can change it later if we need to.

I definitely agree that we should use the scheme you describe for sending ACLs 
over the wire (just the string + version number)

cheers,
Colin


On Fri, May 11, 2018, at 09:39, Andy Coates wrote:
> i think I'm agreeing with you. I was merely suggesting that having an
> additional field that controls how the current field is interpreted is more
> flexible / extensible in the future than using a 'union' style approach,
> where only one of several possible fields should be populated. But it's a
> minor thing.
> 
> 
> 
> 
> 
> 
> On 10 May 2018 at 09:29, Colin McCabe  wrote:
> 
> > Hi Andy,
> >
> > The issue that I was trying to solve here is the Java API.  Right now,
> > someone can write "new ResourceFilter(ResourceType.TRANSACTIONAL_ID,
> > "foo*") and have a ResourceFilter that applies to a Transactional ID named
> > "foo*".  This has to continue to work, or else we have broken compatibility.
> >
> > I was proposing that there would be something like a new function like
> > ResourceFilter.fromPattern(ResourceType.TRANSACTIONAL_ID, "foo*") which
> > would create a ResourceFilter that applied to transactional IDs starting
> > with "foo", rather than transactional IDs named "foo*" specifically.
> >
> > I don't think it's important whether the Java class has an integer, an
> > enum, or two string fields.  The important thing is that there's a new
> > static function, or new constructor overload, etc. that works for patterns
> > rather than literal strings.
> >
> > On Thu, May 10, 2018, at 03:30, Andy Coates wrote:
> > > Rather than having name and pattern fields on the ResourceFilter, where
> > > it’s only valid for one to be set, and we want to restrict the character
> > > set in case future enhancements need them, we could instead add a new
> > > integer ‘nameType’ field, and use constants to indicate how the name
> > > field should be interpreted, e.g. 0 = literal, 1 = wildcard. This would
> > > be extendable, e.g we can later add 2 = regex, or what ever, and
> > > wouldn’t require any escaping.
> >
> > This is very user-unfriendly, though.  Users don't want to have to
> > explicitly supply a version number when using the API, which is what this
> > would force them to do.  I don't think users are going to want to memorize
> > that version 4 supprted "+", whereas version 3 only supported "[0-9]", or
> > whatever.
> >
> > Just as an example, do you remember which versions of FetchRequest added
> > which features?  I don't.  I always have to look at the code to remember.
> >
> > Also, escaping is still required any time you overload a character to mean
> > two things.  Escaping is required in the current proposal to be able to
> > create a pattern that matches only "foo*".  You have to type "foo\*"  It
> > would be required if we forced users to specify a version, as well.
> >
> > best,
> > Colin
> >
> > >
> > > Sent from my iPhone
> > >
> > > > On 7 May 2018, at 05:16, Piyush Vijay  wrote:
> > > >
> > > > Makes sense. I'll update the KIP.
> > > >
> > > > Does anyone have any other comments? :)
> > > >
> > > > Thanks
> > > >
> > > >
> > > > Piyush Vijay
> > > >
> > > >> On Thu, May 3, 2018 at 11:55 AM, Colin McCabe 
> > wrote:
> > > >>
> > > >> Yeah, I guess that's a good point.  It probably makes sense to
> > support the
> > > >> prefix scheme for consumer groups and transactional IDs as well as
> > topics.
> > > >>
> > > >> I agree that the current situation where anything goes in consumer
> > group
> > > >> names and transactional ID names is not ideal.  I wish we could
> > rewind the
> > > >> clock and impose restrictions on the names.  However, it doesn't seem
> > > >> practical at the moment.  Adding new restrictions would break a lot of
> > > >> existing users after an upgrade.  It would be a really bad upgrade
> > > >> experience.
> > > >>
> > > >> However, I think we can support this in a compatible way.  From the
> > > >> perspective of AdminClient, we just have to add a new field to
> > > >> R

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-11 Thread Colin McCabe

On Wed, May 9, 2018, at 05:41, Viktor Somogyi wrote:
> Hi Colin,
> 
> > We are going to need to create a new version of AlterConfigsRequest to add 
> > the "incremental" boolean.  So while we're doing that, maybe we can change 
> > the type to NULLABLE_STRING.
> 
> I was just talking to a colleague yesterday and we came to the
> conclusion that we should keep the boolean flag only on the client
> side (as you may have suggested earlier?) and not make part of the
> protocol as it might lead to a very complicated API on the long term.
> Also we would keep the server side API simpler. Instead of the
> protocol change we could just simply have the boolean flag in
> AlterConfigOptions and the AdminClient should do the get-merge-set
> logic which corresponds to the behavior of the current ConfigCommand.
> That way we won't need to change the protocol for now but still have
> both functionality. What do you think?

 Hi Viktor,

Doing get-merge-set is buggy, though.  If someone else does get-merge-set at 
the same time as you, you might overwrite that person's changes, or vice versa. 
 So I really don't think we should try to do this.  Also, having both an 
incremental and a full API is useful, and it's just a single boolean at the 
protocol and API level.

> 
> > Hmm.  Not sure I follow.  KIP-133 doesn't use the empty string or 
> > "" to indicate defaults, does it?
> 
> No it doesn't. It was just my early idea to indicate "delete" on the
> protocol level. (We are using  for denoting the default
> client id or user in zookeeper.) Rajini was referring that we
> shouldn't expose this to the protocol level but instead denote delete
> with an empty string.
> 
> > This comes from DescribeConfigsResponse.
> > Unless I'm missing something, I think your suggestion to not expose 
> > "" is already implemented?
> 
> In some way, yes. Although this one is used in describe and not in
> alter. For alter I don't think we'd need my early "" idea.

OK.  Thanks for the explanation.  Using an empty string to indicate delete, as 
Rajini suggested, seems pretty reasonable to me.  null would work as well.

> 
> >> And we use STRING rather than NULLABLE_STRING in describe configs etc. So 
> >> we
> >> should probably do the same for quotas."
> >
> > I think nearly all responses treat ERROR_MESSAGE as a nullable string.  
> > CommonFields#ERROR_MESSAGE, which is used by most of them, is a nullable 
> > string.  It's DescribeConfigsResponse that is the black sheep here.
> >
> >  > public static final Field.NullableStr ERROR_MESSAGE = new 
> > Field.NullableStr("error_message", "Response error message");
> 
> Looking at DescribeConfigsResponse (and AlterConfigsResponse) they use
> nullable_string in the code. KIP-133 states otherwise though. So in
> this case it's not a problem luckily.

Thanks for finding this inconsistency.  I'll change the KIP to reflect what was 
actually implemented (nullable string for error).

cheers,
Colin

> 
> > What about writing a small script that just handles setting up SCRAM 
> > credentials?  It would probably be easier to maintain than the old config 
> > command.  Otherwise we have to explain when each tool should be used, which 
> > will be confusing to users.
> 
> I'd like that, yes :).
> 
> Cheers,
> Viktor
> 
> On Mon, May 7, 2018 at 6:52 PM, Colin McCabe  wrote:
> > On Fri, May 4, 2018, at 05:49, Viktor Somogyi wrote:
> >> Hi Colin,
> >>
> >> > Rather than breaking compatibility, we should simply add a new 
> >> > "incremental" boolean to AlterConfigsOptions.  Callers can set this 
> >> > boolean to true when they want the update to be incremental.  It should 
> >> > default to false so that old code continues to work.
> >>
> >> Agreed, let's do it this way.
> >>
> >> > Hmm.  I don't think AlterOperation is necessary.  If the user wants to 
> >> > delete a configuration key named "foo", they can create a ConfigEntry 
> >> > with name = "foo", value = null.
> >>
> >> AlterConfig's config type currently is string, so the only possibility
> >> is to use an empty string as changing the type to nullable_string
> >> could be breaking if the client code doesn't expect -1 as the string
> >> size. In the discussion thread earlier we had a conversation about
> >> this with Rajini, let me paste it here (so it gives some context). At
> >> that point I had the text

Re: [VOTE] KIP-278: Add version option to Kafka's commands

2018-05-11 Thread Colin McCabe

+1 (non-binding)

Colin

On Fri, May 11, 2018, at 12:35, Attila Sasvári wrote:
> +1 (non-binding)
> 
> Thomas Crayford  ezt írta (időpont: 2018. máj.
> 11., P 18:20):
> 
> > +1 (non-binding)
> >
> > On Fri, May 11, 2018 at 5:17 PM, Guozhang Wang  wrote:
> >
> > > Thanks Toru-san, +1 (binding)
> > >
> > > On Fri, May 11, 2018 at 8:30 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > Thanks for the KIP! +1 (binding)
> > > >
> > > > On Fri, May 11, 2018 at 12:35 AM, Manikumar  > >
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Thanks for the KIP.
> > > > >
> > > > > On Fri, May 11, 2018 at 12:56 PM, zhenya Sun 
> > wrote:
> > > > >
> > > > > > +1 building
> > > > > > > 在 2018年5月11日，上午9:51，Ted Yu  写道：
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Thu, May 10, 2018 at 6:42 PM, Sasaki Toru <
> > > > > sasaki...@oss.nttdata.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I would like to start the vote on KIP-278: Add version option to
> > > > > Kafka's
> > > > > > >> commands.
> > > > > > >>
> > > > > > >> The link to this KIP is here:
> > > > > > >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-278+-
> > > > > > >> +Add+version+option+to+Kafka%27s+commands>
> > > > > > >>
> > > > > > >> The discussion thread is here:
> > > > > > >> <
> > https://www.mail-archive.com/dev@kafka.apache.org/msg86688.html>
> > > > > > >>
> > > > > > >>
> > > > > > >> Many thanks,
> > > > > > >> Sasaki
> > > > > > >>
> > > > > > >> --
> > > > > > >> Sasaki Toru(sasaki...@oss.nttdata.com) NTT DATA CORPORATION
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-15 Thread Colin McCabe

Hi Piyush,

I think AclBinding should operate the same way as AclBindingFilter.

So you should be able to do something like this:
> AclBindingFilter filter = new AclBindingFiler(new 
> ResourceFilter(ResourceType.GROUP, "foo*"))
> AclBinding binding = new AclBinding(new Resource(ResourceType.GROUP, "foo*"))
> assertTrue(filter.matches(binding));

Thinking about this more, it's starting to feel really messy to create new 
"pattern" constructors for Resource and ResourceFilter.  I don't think people 
will be able to figure this out.  Maybe we should just have a limited 
compatibility break here, where it is now required to escape weird consumer 
group names when creating ACLs for them.

To future-proof this, we should reserve a bunch of characters at once, like *, 
@, $, %, ^, &, +, [, ], etc.  If these characters appear in a resource name, it 
should be an error, unless they are escaped with a backslash.  That way, we can 
use them in the future.  We should create a Resource.escapeName function which 
adds the correct escape characters to resource names (so it would translate 
foo* into foo\*, foo+bar into foo\+bar, etc. etc.

best,
Colin


On Mon, May 14, 2018, at 17:08, Piyush Vijay wrote:
> Colin,
> 
> createAcls take a AclBinding, however, instead of AclBindingFilter. What
> are your thoughts here?
> 
> public abstract DescribeAclsResult describeAcls(AclBindingFilter
> filter, DescribeAclsOptions options);
> 
> public abstract CreateAclsResult createAcls(Collection
> acls, CreateAclsOptions options);
> 
> public abstract DeleteAclsResult
> deleteAcls(Collection filters, DeleteAclsOptions
> options);
> 
> 
> Thanks
> 
> Piyush Vijay
> 
> On Mon, May 14, 2018 at 9:26 AM, Andy Coates  wrote:
> 
> > +1
> >
> > On 11 May 2018 at 17:14, Colin McCabe  wrote:
> >
> > > Hi Andy,
> > >
> > > I see what you mean.  I guess my thought here is that if the fields are
> > > private, we can change it later if we need to.
> > >
> > > I definitely agree that we should use the scheme you describe for sending
> > > ACLs over the wire (just the string + version number)
> > >
> > > cheers,
> > > Colin
> > >
> > >
> > > On Fri, May 11, 2018, at 09:39, Andy Coates wrote:
> > > > i think I'm agreeing with you. I was merely suggesting that having an
> > > > additional field that controls how the current field is interpreted is
> > > more
> > > > flexible / extensible in the future than using a 'union' style
> > approach,
> > > > where only one of several possible fields should be populated. But
> > it's a
> > > > minor thing.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 10 May 2018 at 09:29, Colin McCabe  wrote:
> > > >
> > > > > Hi Andy,
> > > > >
> > > > > The issue that I was trying to solve here is the Java API.  Right
> > now,
> > > > > someone can write "new ResourceFilter(ResourceType.TRANSACTIONAL_ID,
> > > > > "foo*") and have a ResourceFilter that applies to a Transactional ID
> > > named
> > > > > "foo*".  This has to continue to work, or else we have broken
> > > compatibility.
> > > > >
> > > > > I was proposing that there would be something like a new function
> > like
> > > > > ResourceFilter.fromPattern(ResourceType.TRANSACTIONAL_ID, "foo*")
> > > which
> > > > > would create a ResourceFilter that applied to transactional IDs
> > > starting
> > > > > with "foo", rather than transactional IDs named "foo*" specifically.
> > > > >
> > > > > I don't think it's important whether the Java class has an integer,
> > an
> > > > > enum, or two string fields.  The important thing is that there's a
> > new
> > > > > static function, or new constructor overload, etc. that works for
> > > patterns
> > > > > rather than literal strings.
> > > > >
> > > > > On Thu, May 10, 2018, at 03:30, Andy Coates wrote:
> > > > > > Rather than having name and pattern fields on the ResourceFilter,
> > > where
> > > > > > it’s only valid for one to be set, and we want to restrict the
> > > character
> > > > > > set in case future enhancements need them, we could instead add a
> > new
> >

Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-15 Thread Colin McCabe

Hi Robert,

Thanks for posting this.  In the past we've been kind of reluctant to add more 
complexity to configuration.  I think Connect does have a clear need for this 
kind of functionality, though.  As you mention, Connect integrates with 
external systems, which are very likely to have passwords stored in Vault, 
KeyWhiz or some other external system.

The KIP says that "Vault is very popular and has been described as 'the current 
gold standard in secret management and provisioning'."  I think this might be a 
bit too much detail -- we don't really need to pick favorites, right? :)

I think we should make configuration consistent between the broker and Connect. 
 If people can use constructs like 
jdbc.config.key="${vault:jdbc.user}${vault:jdbc.password}" in Connect, they'll 
want to do it on the broker too, in a consistent way.

If I understand correctly, ConfigProvider represents an external configuration 
source, such as VaultConfigProvider, KeyWhizConfigProvider, etc.

I think we should make the substitution part of the generic configuration code, 
rather than specific to individual ConfigProviders.  We don't really want it to 
work differently for Vault vs. KeyWhiz vs. AWS secrets, etc. etc.

We should also spell out exactly how substitution works.  For example, is 
substitution limited to 1 level deep?  In other words, If I have foo="${bar}" 
and bar=${baz}, probably foo should just be set equal to "${baz}" rather than 
chasing more than one level of indirection.

We should also spell out how this interacts with KIP-226 configurations.  I 
would suggest that KIP-226 variables not be subjected to substitution.  The 
reason is because in theory substitution could lead to different results on 
different brokers, since the different brokers may not have the same 
ConfigProviders configured.  Also, having substitutions in the KIP-226 
configuration makes it more difficult for the admin to understand what the 
centrally managed configuration is.

It seems the main goal is the ability to load a batch of key/value pairs from 
the ConfigProvider, and the ability to subscribe to notifications about changes 
to certain parameters.  Maybe a good generic interface would be like this:

 > public interface ConfigProvider extends Closeable {
>  // batched get is potentially more efficient
 > Map get(Collection keys);
>
>// The ConfigProvider is responsible for making this callback whenever the 
> key changes.
>// Some ConfigProviders may want to have a background thread with a 
> configurable update interval.
 > void subscribe(String key, ConfigurationChangeCallback callback);
>
>// Inverse of subscribe
 > void unsubscribe(String key);
>
>// Close all subscriptions and clean up all resources
 > void close();
 > }
 > 
 > interface ConfigurationChangeCallback {
 > void onChange(String key, String value);
 > }

With regard to ConfigTransformer: do we need to include all this code in the 
KIP?  Seems like an implementation detail.

> Other connectors such as the S3 connector are tightly coupled with a 
> particular secret manager, and may
> wish to handle rotation on their own.  

Is there a way to avoid this couping?  Seems like some users might want to use 
their own secret manager here.

best,
Colin

On Wed, May 9, 2018, at 16:32, Robert Yokota wrote:
> Hi Magesh,
> 
> I updated the KIP with a link to a PR for a working prototype.  The
> prototype does not yet use the Connect plugin machinery for class loader
> isolation, but should give you an idea of what the final implementation
> will look like.  Here is the link:
> https://github.com/apache/kafka/pull/4990/files.
> 
> I also added an example of a FileConfigProvider to the KIP.
> 
> Thanks,
> Robert
> 
> On Wed, May 9, 2018 at 10:04 AM, Robert Yokota  wrote:
> 
> > Hi Magesh,
> >
> > Thanks for the feedback!
> >
> > I will put together a PR to demonstrate what the implementation might look
> > like, as well as a reference FileConfigProvider.
> >
> > 1.  The delayMs for a (potentially) scheduled reload is determined by the
> > ConfigProvider.  For example, a (hypothetical) VaultConfigProvider, upon
> > contacting Vault for a particular secret, might also obtain a lease
> > duration indicating that the secret expires in 1 hour. The
> > VaultConfigProvider could then call scheduleConfigReload with delayMs set
> > to 360ms (1 hour).  This would cause the Connector to restart in an
> > hour, forcing it to reload the configs and re-resolve all indirect
> > references.
> >
> > 2. Yes, the start() methods in SourceTask and SinkTask would get the
> > configs with all the indirect references resolved.   Those config() methods

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-16 Thread Colin McCabe

> On Tue, May 15, 2018 at 7:18 PM, Rajini Sivaram 
> wrote:
> 
> > Hi Piyush,
> >
> > It is possible to configure PrincipalBuilder for SASL (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 189%3A+Improve+principal+builder+interface+and+add+support+for+SASL). If
> > that satisfies your requirements, perhaps we can move wildcarded principals
> > out of this KIP and focus on wildcarded resources?

+1.

We also need to determine which characters will be reserved for the future.  I 
think previously we thought about @, #, $, %, ^, &, *.

> > On Tue, May 15, 2018 at 7:15 PM, Piyush Vijay 
> > wrote:
> >
> >> Hi Colin,
> >>
> >> Escaping at this level is making sense to me but let me think more and get
> >> back to you.

Thanks, Piyush.  What questions do you think are still open regarding escape 
characters?
As Rajini mentioned, we have to get this in soon in order to make the KIP 
freeze.

> >>
> >> But should we not just get rid of one of AclBinding or AclBindingFilter
> >> then? Is there a reason to keep both given that AclBindingFilter and
> >> AclBinding look exact copy of each other after this change? This will be a
> >> one-time breaking change in APIs marked as "Evolving", but makes sense in
> >> the long term? Am I missing something here?

AclBinding represents an ACL.  AclBindingFilter is a filter which can be used 
to locate AclBinding objects.  Similarly with Resource and ResourceFilter.  
There is no reason to combine them because they represent different things.  
Although they contain many of the same fields, they are not exact copies.  Many 
fields can be null in AclBindingFilter-- fields can never be null in AclBinding.

For example, you can have an AclBindingFilter that matches every AclBinding.  
There is more discussion of this on the original KIP that added ACL support to 
AdminClient.

best,
Colin

> >>
> >>
> >>
> >> Piyush Vijay
> >>
> >> On Tue, May 15, 2018 at 9:01 AM, Colin McCabe  wrote:
> >>
> >> > Hi Piyush,
> >> >
> >> > I think AclBinding should operate the same way as AclBindingFilter.
> >> >
> >> > So you should be able to do something like this:
> >> > > AclBindingFilter filter = new AclBindingFiler(new
> >> > ResourceFilter(ResourceType.GROUP, "foo*"))
> >> > > AclBinding binding = new AclBinding(new Resource(ResourceType.GROUP,
> >> > "foo*"))
> >> > > assertTrue(filter.matches(binding));
> >> >
> >> > Thinking about this more, it's starting to feel really messy to create
> >> new
> >> > "pattern" constructors for Resource and ResourceFilter.  I don't think
> >> > people will be able to figure this out.  Maybe we should just have a
> >> > limited compatibility break here, where it is now required to escape
> >> weird
> >> > consumer group names when creating ACLs for them.
> >> >
> >> > To future-proof this, we should reserve a bunch of characters at once,
> >> > like *, @, $, %, ^, &, +, [, ], etc.  If these characters appear in a
> >> > resource name, it should be an error, unless they are escaped with a
> >> > backslash.  That way, we can use them in the future.  We should create a
> >> > Resource.escapeName function which adds the correct escape characters to
> >> > resource names (so it would translate foo* into foo\*, foo+bar into
> >> > foo\+bar, etc. etc.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> >
> >> > On Mon, May 14, 2018, at 17:08, Piyush Vijay wrote:
> >> > > Colin,
> >> > >
> >> > > createAcls take a AclBinding, however, instead of AclBindingFilter.
> >> What
> >> > > are your thoughts here?
> >> > >
> >> > > public abstract DescribeAclsResult describeAcls(AclBindingFilter
> >> > > filter, DescribeAclsOptions options);
> >> > >
> >> > > public abstract CreateAclsResult createAcls(Collection
> >> > > acls, CreateAclsOptions options);
> >> > >
> >> > > public abstract DeleteAclsResult
> >> > > deleteAcls(Collection filters, DeleteAclsOptions
> >> > > options);
> >> > >
> >> > >
> >> > > Thanks
> >> > >
> >> > > Piyush Vijay
> >> > >
> >> > > On Mon, May 14, 2018 at 9

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Colin McCabe

Thanks, Robert.  Looks good overall.

As a clarification about the indirections, what if I have the connect 
configuration key foo set up as ${vault:bar}, and in Vault, have the bar key 
set to ${file:baz}?  Does connect get foo as the contents of the baz file?  I 
would argue that it should not (and in general, we shouldn't allow 
ConfigProviders to indirect to other ConfigProviders) but I don't think it's 
spelled out right now.

What's the behavior when a config key is not found in Vault (or other 
ConfigProvider)?  Does the variable get replaced with the empty string, or with 
the literal ${vault:whatever} string?

Do we really need "${provider:[path:]key}", or can it just be ${provider:key}?  
It seems like the path can be rolled up into the key.  So if you want to put 
your connect keys under my.connect.path, you ask for 
${vault:my.connect.path.jdbc.config}, etc.

>// A delayMs of 0 indicates an immediate change; a positive delayMs 
> indicates 
>// that a future change is anticipated (such as a lease duration)
>void onChange(String path, Map values, int delayMs);

Do we really need delayMs?  It seems like if you get a callback with delayMs 
set, you don't know what the new values will be, only that an update is coming, 
but not yet here.

best,
Colin

On Wed, May 16, 2018, at 17:05, Robert Yokota wrote:
> Hello everyone,
> 
> After a good round of discussions with excellent feedback and no major
> objections, I would like to start a vote on KIP-297 to externalize secrets
> from Kafka Connect configurations.  My thanks in advance!
> 
> KIP: <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations
> >
> 
> JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> 
> Discussion thread: <
> https://www.mail-archive.com/dev@kafka.apache.org/msg87638.html>
> 
> Best,
> Robert

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Colin McCabe

Thanks, Robert.  With regard to delayMs, can’t we just restart the
Connector when the keys are actually changed?  Or is the concern that
this would lengthen the effective key rotation time?  Can’t the user
just configure a slightly shorter key rotation time to counteract
this concern?
Regards,
Colin

On Wed, May 16, 2018, at 19:13, Robert Yokota wrote:
> Hi Colin,
>
> Good questions.
>
>
> > As a clarification about the indirections, what if I have the
> > connect> configuration key foo set up as ${vault:bar}, and in Vault,
> have the bar> key set to ${file:baz}?
> > Does connect get foo as the contents of the baz file?  I would
> > argue that> it should not (and in general, we shouldn't allow 
> > ConfigProviders to
> indirect to other
> > ConfigProviders) but I don't think it's spelled out right now.
>
> I've added a clarification to the KIP that further indirections are
> not> performed even if the values returned from ConfigProviders have the
> variable syntax.
>
>
> > What's the behavior when a config key is not found in Vault
> > (or other> ConfigProvider)?  Does the variable get replaced with the empty
> string, or> with the literal ${vault:whatever} string?
>
> It would remain unresolved and still be of the form
> ${provider:key}.  I've> added a clarification to the KIP.
>
>
> > Do we really need "${provider:[path:]key}", or can it just be
> ${provider:key}?
>
> The path is a separate parameter in the APIs, so I think it's
> important to> explicitly delineate it in the variable syntax.  For example, I
> currently> have a working VaultConfigProvider prototype and the syntax for a
> Vault key> reference looks like
>
> db_password=${vault:secret/staging:mysql_password}
>
> I think it's important to standardize how to separate the path
> from the key> rather than leave it to each ConfigProvider to determine a 
> possibly
> different way.  This will also make it easier to move secrets from one> 
> ConfigProvider to another should one choose to do so.
>
>
> > Do we really need delayMs?
>
> One of the goals of this KIP is to allow for secrets rotation without> having 
> to modify existing connectors.  In the case of the
> VaultConfigProvider, it knows the lease durations and will be able to> 
> schedule a restart of the Connector using an API in the Herder.  The
> delayMs will simply be passed to the Herder.restartConnector(long
> delayMs,> String connName, Callback cb) method here:
>
> https://github.com/rayokota/kafka/blob/secrets-in-connect-configs/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Herder.java#L170>
>
> Best,
> Robert
>
>
>
> On Wed, May 16, 2018 at 6:16 PM, Colin McCabe
>  wrote:>
> > Thanks, Robert.  Looks good overall.
> >
> > As a clarification about the indirections, what if I have the
> > connect> > configuration key foo set up as ${vault:bar}, and in Vault, have
> > the bar> > key set to ${file:baz}?  Does connect get foo as the contents of
> > the baz> > file?  I would argue that it should not (and in general, we
> > shouldn't allow> > ConfigProviders to indirect to other ConfigProviders) 
> > but I
> > don't think> > it's spelled out right now.
> >
> > What's the behavior when a config key is not found in Vault
> > (or other> > ConfigProvider)?  Does the variable get replaced with the empty
> > string, or> > with the literal ${vault:whatever} string?
> >
> > Do we really need "${provider:[path:]key}", or can it just be
> > ${provider:key}?  It seems like the path can be rolled up into the
> > key.  So> > if you want to put your connect keys under my.connect.path, you
> > ask for> > ${vault:my.connect.path.jdbc.config}, etc.
> >
> > >// A delayMs of 0 indicates an immediate change; a positive
> > >delayMs> > indicates
> > >// that a future change is anticipated (such as a lease
> > >duration)> > >void onChange(String path, Map 
> > > values, int
> > >delayMs);> >
> > Do we really need delayMs?  It seems like if you get a callback with> > 
> > delayMs set, you don't know what the new values will be, only
> > that an> > update is coming, but not yet here.
> >
> > best,
> > Colin
> >
> >
> > On Wed, May 16, 2018, at 17:05, Robert Yokota wrote:
> > > Hello everyone,
> > >
> > > After a good round of discussions with excellent feedback and no
> > > major> > > objections, I would like to start a vote on KIP-297 to 
> > > externalize> > secrets
> > > from Kafka Connect configurations.  My thanks in advance!
> > >
> > > KIP: <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 297%3A+Externalizing+Secrets+for+Connect+Configurations
> > > >
> > >
> > > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> > >
> > > Discussion thread: <
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg87638.html>
> > >
> > > Best,
> > > Robert
> >

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-16 Thread Colin McCabe

Hi Viktor,

The shell command isn’t that easy to integrate into applications.
AdminClient will get integrated  into a lot more stuff, which
increases the potential for conflicts.  I would argue that we should
fix this soon.
If we do want to reduce the scope in this KIP, we could do the merge in
the ConfigCommand  tool for now, and leave AC unchanged.
Best,
Colin


On Wed, May 16, 2018, at 04:57, Viktor Somogyi wrote:
> Hi Colin,
>
> > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > set at the same time as you, you might overwrite that person's
> > changes, or vice versa.  So I really don't think we should try to do
> > this.  Also, having both an incremental and a full API is useful,
> > and it's just a single boolean at the protocol and API level.>
> Overwriting somebody's change is currently possible with the
> ConfigCommand, as it will do this get-merge-set behavior on the client> side, 
> in the command. From this perspective I think it's not much
> different to do this with the admin client. Also I think admins don't> modify 
> the quotas/configs of a client/user/topic/broker often (and
> multiple admins would do it even more rarely), so I don't think it is> a big 
> issue. What I think would be useful here but may be out of scope> is to 
> version the changes similarly to leader epochs. So when an admin> updates the 
> configs, it will increment a version number and won't let> other admins to 
> push changes in with lower than that. Instead it would> return an error.
>
> I would be also interested what others think about this?
>
> Cheers,
> Viktor
>
>
> On Sat, May 12, 2018 at 2:29 AM, Colin McCabe
>  wrote:> > On Wed, May 9, 2018, at 05:41, Viktor Somogyi 
> wrote:
> >> Hi Colin,
> >>
> >> > We are going to need to create a new version of
> >> > AlterConfigsRequest to add the "incremental" boolean.  So while
> >> > we're doing that, maybe we can change the type to
> >> > NULLABLE_STRING.> >>
> >> I was just talking to a colleague yesterday and we came to the
> >> conclusion that we should keep the boolean flag only on the client> >> 
> >> side (as you may have suggested earlier?) and not make part of the> >> 
> >> protocol as it might lead to a very complicated API on the long
> >> term.> >> Also we would keep the server side API simpler. Instead of the
> >> protocol change we could just simply have the boolean flag in
> >> AlterConfigOptions and the AdminClient should do the get-merge-set> >> 
> >> logic which corresponds to the behavior of the current
> >> ConfigCommand.> >> That way we won't need to change the protocol for now 
> >> but
> >> still have> >> both functionality. What do you think?
> >
> >  Hi Viktor,
> >
> > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > set at the same time as you, you might overwrite that person's
> > changes, or vice versa.  So I really don't think we should try to do
> > this.  Also, having both an incremental and a full API is useful,
> > and it's just a single boolean at the protocol and API level.> >
> >>
> >> > Hmm.  Not sure I follow.  KIP-133 doesn't use the empty string or
> >> > "" to indicate defaults, does it?> >>
> >> No it doesn't. It was just my early idea to indicate "delete"
> >> on the> >> protocol level. (We are using  for denoting the default
> >> client id or user in zookeeper.) Rajini was referring that we
> >> shouldn't expose this to the protocol level but instead denote
> >> delete> >> with an empty string.
> >>
> >> > This comes from DescribeConfigsResponse.
> >> > Unless I'm missing something, I think your suggestion to not
> >> > expose "" is already implemented?> >>
> >> In some way, yes. Although this one is used in describe and not in> >> 
> >> alter. For alter I don't think we'd need my early "" idea.> >
> > OK.  Thanks for the explanation.  Using an empty string to indicate
> > delete, as Rajini suggested, seems pretty reasonable to me.  null
> > would work as well.> >
> >>
> >> >> And we use STRING rather than NULLABLE_STRING in describe
> >> >> configs etc. So we> >> >> should probably do the same for quotas."
> >> >
> >> > I think nearl

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-17 Thread Colin McCabe

Thanks, Piyush.  +1 for starting the vote soon.

Can you please also add a discussion about escaping?  For example, earlier we 
discussed using backslashes to escape special characters.  So that users can 
create an ACL referring to a literal "foo*" group by creating an ACL for 
"foo\*"  Similarly, you can get a literal backslash with "\\".  This is the 
standard UNIX escaping mechanism.

Also, for the section that says "Changes to AdminClient (needs discussion)", we 
need a new method that will allow users to escape consumer group names and 
other names.  So you can feed this method your "foo\*" consumer group name, and 
it will give you "foo\\\*", which is what you would need to use to create an 
ACL for this consumer group in AdminClient.  I think that's the only change we 
need to admin client

regards,
Colin


On Thu, May 17, 2018, at 08:55, Piyush Vijay wrote:
> Hi Rajini/Colin,
> 
> I will remove the wildcard principals from the scope for now, updating KIP
> right now and will open it for vote.
> 
> Thanks
> 
> 
> Piyush Vijay
> 
> On Thu, May 17, 2018 at 6:59 AM, Rajini Sivaram 
> wrote:
> 
> > Hi Piyush,
> >
> > I have added a PR (https://github.com/apache/kafka/pull/5030) with tests
> > to
> > show how group principals can be used for authorization with custom
> > principal builders. One of the tests uses SASL. It is not quite the same as
> > a full-fledged user groups, but since it works with all security protocols,
> > it could be an alternative to wildcarded principals.
> >
> > Let us know if we can help in any way to get this KIP updated and ready for
> > voting to include in 2.0.0.
> >
> > Thanks,
> >
> > Rajini
> >
> >
> > On Wed, May 16, 2018 at 10:21 PM, Colin McCabe  wrote:
> >
> > > > On Tue, May 15, 2018 at 7:18 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Piyush,
> > > > >
> > > > > It is possible to configure PrincipalBuilder for SASL (
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 189%3A+Improve+principal+builder+interface+and+add+
> > support+for+SASL).
> > > If
> > > > > that satisfies your requirements, perhaps we can move wildcarded
> > > principals
> > > > > out of this KIP and focus on wildcarded resources?
> > >
> > > +1.
> > >
> > > We also need to determine which characters will be reserved for the
> > > future.  I think previously we thought about @, #, $, %, ^, &, *.
> > >
> > > > > On Tue, May 15, 2018 at 7:15 PM, Piyush Vijay <
> > piyushvij...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi Colin,
> > > > >>
> > > > >> Escaping at this level is making sense to me but let me think more
> > > and get
> > > > >> back to you.
> > >
> > > Thanks, Piyush.  What questions do you think are still open regarding
> > > escape characters?
> > > As Rajini mentioned, we have to get this in soon in order to make the KIP
> > > freeze.
> > >
> > > > >>
> > > > >> But should we not just get rid of one of AclBinding or
> > > AclBindingFilter
> > > > >> then? Is there a reason to keep both given that AclBindingFilter and
> > > > >> AclBinding look exact copy of each other after this change? This
> > will
> > > be a
> > > > >> one-time breaking change in APIs marked as "Evolving", but makes
> > > sense in
> > > > >> the long term? Am I missing something here?
> > >
> > > AclBinding represents an ACL.  AclBindingFilter is a filter which can be
> > > used to locate AclBinding objects.  Similarly with Resource and
> > > ResourceFilter.  There is no reason to combine them because they
> > represent
> > > different things.  Although they contain many of the same fields, they
> > are
> > > not exact copies.  Many fields can be null in AclBindingFilter-- fields
> > can
> > > never be null in AclBinding.
> > >
> > > For example, you can have an AclBindingFilter that matches every
> > > AclBinding.  There is more discussion of this on the original KIP that
> > > added ACL support to AdminClient.
> > >
> > > best,
> > > Colin
> > >
> > > > >>
> > >

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-17 Thread Colin McCabe

Hi Robert,

Hmm.  I thought that if you're using ConfigChangeCallback, you are relying on 
the ConfigProvider to make a callback to you when the configuration has 
changed.  So isn't that always the "push model" (where the ConfigProvider 
pushes changes to Connect).  If you want the "pull model" where you initiate 
updates, you can simply call ConfigProvider#get directly, right?

The actual implementation of ConfigProvider subclasses will depend on the type 
of configuration storage mechanism on the backend.  In the case of Vault, it 
sounds like we need to have something like a ScheduledExecutor which re-fetches 
keys after a certain amount of time.

As an aside, what does a "lease duration" mean for a configuration key?  Does 
that mean Vault will reject changes to the configuration key if I try to make 
them within the lease duration?  Or is this like a period after which a 
password is automatically rotated?

On Wed, May 16, 2018, at 22:25, Robert Yokota wrote:
> Hi Colin,
> 
> > With regard to delayMs, can’t we just restart the
> > Connector when the keys are actually changed?
> 
> Currently the VaultConfigProvider does not find out when values for keys
> have changed.  You could do this with a poll model (with a
> background thread in the ConfigProvider), but since for each key-value
> pair, Vault provides a lease duration stating exactly when a value for a
> key will change in the future, this is an alternative model of just passing
> the lease duration to the client (in this case the Connector), to allow it
> to determine what to do (such as schedule a restart).   This may allow one
> to avoid the complexity of figuring out a proper poll interval (with lease
> durations of varying periods), or worrying about putting too much load on
> the secrets manager by polling too often.

Those things are still concerns if the Connector is polling, right?  Perhaps 
the connector poll too often and puts too much load on Vault.  And so forth.  
It seems like this problem needs to be solved either way (and probably can be 
solved with reasonable default minimum fetch intervals).

best,
Colin


>  In other words, by adding this
> one additional parameter, a ConfigProvider can provide both push and pull
> models to clients, perhaps with an additional configuration parameter to
> the ConfigProvider to determine which model (push or poll) to use.
> 
> Thanks,
> Robert
> 
> On Wed, May 16, 2018 at 9:56 PM, Colin McCabe  wrote:
> 
> > Thanks, Robert.  With regard to delayMs, can’t we just restart the
> > Connector when the keys are actually changed?  Or is the concern that
> > this would lengthen the effective key rotation time?  Can’t the user
> > just configure a slightly shorter key rotation time to counteract
> > this concern?
> > Regards,
> > Colin
> >
> > On Wed, May 16, 2018, at 19:13, Robert Yokota wrote:
> > > Hi Colin,
> > >
> > > Good questions.
> > >
> > >
> > > > As a clarification about the indirections, what if I have the
> > > > connect> configuration key foo set up as ${vault:bar}, and in Vault,
> > > have the bar> key set to ${file:baz}?
> > > > Does connect get foo as the contents of the baz file?  I would
> > > > argue that> it should not (and in general, we shouldn't allow
> > ConfigProviders to
> > > indirect to other
> > > > ConfigProviders) but I don't think it's spelled out right now.
> > >
> > > I've added a clarification to the KIP that further indirections are
> > > not> performed even if the values returned from ConfigProviders have the
> > > variable syntax.
> > >
> > >
> > > > What's the behavior when a config key is not found in Vault
> > > > (or other> ConfigProvider)?  Does the variable get replaced with the
> > empty
> > > string, or> with the literal ${vault:whatever} string?
> > >
> > > It would remain unresolved and still be of the form
> > > ${provider:key}.  I've> added a clarification to the KIP.
> > >
> > >
> > > > Do we really need "${provider:[path:]key}", or can it just be
> > > ${provider:key}?
> > >
> > > The path is a separate parameter in the APIs, so I think it's
> > > important to> explicitly delineate it in the variable syntax.  For
> > example, I
> > > currently> have a working VaultConfigProvider prototype and the syntax
> > for a
> > > Vault key> reference looks like
> > >
> > > db_password=${vault:secret/staging:mysql_password}
> > >
> > > I

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-17 Thread Colin McCabe

Hi Viktor,

Since the KIP freeze is coming up really soon, maybe we should just drop the 
section about changes to AlterConfigs from KIP-248.  We don't really need it 
here, since ConfigCommand can use AlterConfigs as-is.

We can pick up the discussion about improving AlterConfigs in a future KIP.

cheers,
Colin

On Wed, May 16, 2018, at 22:06, Colin McCabe wrote:
> Hi Viktor,
> 
> The shell command isn’t that easy to integrate into applications.
> AdminClient will get integrated  into a lot more stuff, which
> increases the potential for conflicts.  I would argue that we should
> fix this soon.
> If we do want to reduce the scope in this KIP, we could do the merge in
> the ConfigCommand  tool for now, and leave AC unchanged.
> Best,
> Colin
> 
> 
> On Wed, May 16, 2018, at 04:57, Viktor Somogyi wrote:
> > Hi Colin,
> >
> > > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > > set at the same time as you, you might overwrite that person's
> > > changes, or vice versa.  So I really don't think we should try to do
> > > this.  Also, having both an incremental and a full API is useful,
> > > and it's just a single boolean at the protocol and API level.>
> > Overwriting somebody's change is currently possible with the
> > ConfigCommand, as it will do this get-merge-set behavior on the client> 
> > side, in the command. From this perspective I think it's not much
> > different to do this with the admin client. Also I think admins don't> 
> > modify the quotas/configs of a client/user/topic/broker often (and
> > multiple admins would do it even more rarely), so I don't think it is> a 
> > big issue. What I think would be useful here but may be out of scope> is to 
> > version the changes similarly to leader epochs. So when an admin> updates 
> > the configs, it will increment a version number and won't let> other admins 
> > to push changes in with lower than that. Instead it would> return an error.
> >
> > I would be also interested what others think about this?
> >
> > Cheers,
> > Viktor
> >
> >
> > On Sat, May 12, 2018 at 2:29 AM, Colin McCabe
> >  wrote:> > On Wed, May 9, 2018, at 05:41, Viktor 
> > Somogyi wrote:
> > >> Hi Colin,
> > >>
> > >> > We are going to need to create a new version of
> > >> > AlterConfigsRequest to add the "incremental" boolean.  So while
> > >> > we're doing that, maybe we can change the type to
> > >> > NULLABLE_STRING.> >>
> > >> I was just talking to a colleague yesterday and we came to the
> > >> conclusion that we should keep the boolean flag only on the client> >> 
> > >> side (as you may have suggested earlier?) and not make part of the> >> 
> > >> protocol as it might lead to a very complicated API on the long
> > >> term.> >> Also we would keep the server side API simpler. Instead of the
> > >> protocol change we could just simply have the boolean flag in
> > >> AlterConfigOptions and the AdminClient should do the get-merge-set> >> 
> > >> logic which corresponds to the behavior of the current
> > >> ConfigCommand.> >> That way we won't need to change the protocol for now 
> > >> but
> > >> still have> >> both functionality. What do you think?
> > >
> > >  Hi Viktor,
> > >
> > > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > > set at the same time as you, you might overwrite that person's
> > > changes, or vice versa.  So I really don't think we should try to do
> > > this.  Also, having both an incremental and a full API is useful,
> > > and it's just a single boolean at the protocol and API level.> >
> > >>
> > >> > Hmm.  Not sure I follow.  KIP-133 doesn't use the empty string or
> > >> > "" to indicate defaults, does it?> >>
> > >> No it doesn't. It was just my early idea to indicate "delete"
> > >> on the> >> protocol level. (We are using  for denoting the 
> > >> default
> > >> client id or user in zookeeper.) Rajini was referring that we
> > >> shouldn't expose this to the protocol level but instead denote
> > >> delete> >> with an empty string.
> > >>
> > >> > This comes from DescribeConfigsResponse.
> > >> > Unless I'

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-17 Thread Colin McCabe

On Thu, May 17, 2018, at 09:28, Piyush Vijay wrote:
> I was planning to do that.
> 
> Another unrelated detail is the presence of the support for ‘*’ ACL
> currently. Looks like we’ll have to keep supporting this as a special case,
> even though using a different location for wildcard-suffix ACLs on Zk.

+1.

Thanks, Piyush.

Colin

> 
> 
> 
> On Thu, May 17, 2018 at 9:15 AM Colin McCabe  wrote:
> 
> > Thanks, Piyush.  +1 for starting the vote soon.
> >
> > Can you please also add a discussion about escaping?  For example, earlier
> > we discussed using backslashes to escape special characters.  So that users
> > can create an ACL referring to a literal "foo*" group by creating an ACL
> > for "foo\*"  Similarly, you can get a literal backslash with "\\".  This is
> > the standard UNIX escaping mechanism.
> >
> > Also, for the section that says "Changes to AdminClient (needs
> > discussion)", we need a new method that will allow users to escape consumer
> > group names and other names.  So you can feed this method your "foo\*"
> > consumer group name, and it will give you "foo\\\*", which is what you
> > would need to use to create an ACL for this consumer group in AdminClient.
> > I think that's the only change we need to admin client
> >
> > regards,
> > Colin
> >
> >
> > On Thu, May 17, 2018, at 08:55, Piyush Vijay wrote:
> > > Hi Rajini/Colin,
> > >
> > > I will remove the wildcard principals from the scope for now, updating
> > KIP
> > > right now and will open it for vote.
> > >
> > > Thanks
> > >
> > >
> > > Piyush Vijay
> > >
> > > On Thu, May 17, 2018 at 6:59 AM, Rajini Sivaram  > >
> > > wrote:
> > >
> > > > Hi Piyush,
> > > >
> > > > I have added a PR (https://github.com/apache/kafka/pull/5030) with
> > tests
> > > > to
> > > > show how group principals can be used for authorization with custom
> > > > principal builders. One of the tests uses SASL. It is not quite the
> > same as
> > > > a full-fledged user groups, but since it works with all security
> > protocols,
> > > > it could be an alternative to wildcarded principals.
> > > >
> > > > Let us know if we can help in any way to get this KIP updated and
> > ready for
> > > > voting to include in 2.0.0.
> > > >
> > > > Thanks,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Wed, May 16, 2018 at 10:21 PM, Colin McCabe 
> > wrote:
> > > >
> > > > > > On Tue, May 15, 2018 at 7:18 PM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Piyush,
> > > > > > >
> > > > > > > It is possible to configure PrincipalBuilder for SASL (
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 189%3A+Improve+principal+builder+interface+and+add+
> > > > support+for+SASL).
> > > > > If
> > > > > > > that satisfies your requirements, perhaps we can move wildcarded
> > > > > principals
> > > > > > > out of this KIP and focus on wildcarded resources?
> > > > >
> > > > > +1.
> > > > >
> > > > > We also need to determine which characters will be reserved for the
> > > > > future.  I think previously we thought about @, #, $, %, ^, &, *.
> > > > >
> > > > > > > On Tue, May 15, 2018 at 7:15 PM, Piyush Vijay <
> > > > piyushvij...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Colin,
> > > > > > >>
> > > > > > >> Escaping at this level is making sense to me but let me think
> > more
> > > > > and get
> > > > > > >> back to you.
> > > > >
> > > > > Thanks, Piyush.  What questions do you think are still open regarding
> > > > > escape characters?
> > > > > As Rajini mentioned, we have to get this in soon in order to make
> > the KIP
> > > > > freeze.
> > > > >
> > > > > > >>
> > > > > >

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-17 Thread Colin McCabe

Actually, I just realized that this won't work.  The AlterConfigs API is kind 
of broken right now.  DescribeConfigs won't return the "sensitive" 
configurations like passwords.  So doing describe + edit + alter will wipe out 
all sensitive configs. :(

I think we should probably just create a flag for alterConfigs which marks it 
as incremental, like we discussed earlier, and do this as a compatible change 
that is needed for the shell command.

best,
Colin


On Thu, May 17, 2018, at 09:32, Colin McCabe wrote:
> Hi Viktor,
> 
> Since the KIP freeze is coming up really soon, maybe we should just drop 
> the section about changes to AlterConfigs from KIP-248.  We don't really 
> need it here, since ConfigCommand can use AlterConfigs as-is.
> 
> We can pick up the discussion about improving AlterConfigs in a future KIP.
> 
> cheers,
> Colin
> 
> On Wed, May 16, 2018, at 22:06, Colin McCabe wrote:
> > Hi Viktor,
> > 
> > The shell command isn’t that easy to integrate into applications.
> > AdminClient will get integrated  into a lot more stuff, which
> > increases the potential for conflicts.  I would argue that we should
> > fix this soon.
> > If we do want to reduce the scope in this KIP, we could do the merge in
> > the ConfigCommand  tool for now, and leave AC unchanged.
> > Best,
> > Colin
> > 
> > 
> > On Wed, May 16, 2018, at 04:57, Viktor Somogyi wrote:
> > > Hi Colin,
> > >
> > > > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > > > set at the same time as you, you might overwrite that person's
> > > > changes, or vice versa.  So I really don't think we should try to do
> > > > this.  Also, having both an incremental and a full API is useful,
> > > > and it's just a single boolean at the protocol and API level.>
> > > Overwriting somebody's change is currently possible with the
> > > ConfigCommand, as it will do this get-merge-set behavior on the client> 
> > > side, in the command. From this perspective I think it's not much
> > > different to do this with the admin client. Also I think admins don't> 
> > > modify the quotas/configs of a client/user/topic/broker often (and
> > > multiple admins would do it even more rarely), so I don't think it is> a 
> > > big issue. What I think would be useful here but may be out of scope> is 
> > > to version the changes similarly to leader epochs. So when an admin> 
> > > updates the configs, it will increment a version number and won't let> 
> > > other admins to push changes in with lower than that. Instead it would> 
> > > return an error.
> > >
> > > I would be also interested what others think about this?
> > >
> > > Cheers,
> > > Viktor
> > >
> > >
> > > On Sat, May 12, 2018 at 2:29 AM, Colin McCabe
> > >  wrote:> > On Wed, May 9, 2018, at 05:41, Viktor 
> > > Somogyi wrote:
> > > >> Hi Colin,
> > > >>
> > > >> > We are going to need to create a new version of
> > > >> > AlterConfigsRequest to add the "incremental" boolean.  So while
> > > >> > we're doing that, maybe we can change the type to
> > > >> > NULLABLE_STRING.> >>
> > > >> I was just talking to a colleague yesterday and we came to the
> > > >> conclusion that we should keep the boolean flag only on the client> >> 
> > > >> side (as you may have suggested earlier?) and not make part of the> >> 
> > > >> protocol as it might lead to a very complicated API on the long
> > > >> term.> >> Also we would keep the server side API simpler. Instead of 
> > > >> the
> > > >> protocol change we could just simply have the boolean flag in
> > > >> AlterConfigOptions and the AdminClient should do the get-merge-set> >> 
> > > >> logic which corresponds to the behavior of the current
> > > >> ConfigCommand.> >> That way we won't need to change the protocol for 
> > > >> now but
> > > >> still have> >> both functionality. What do you think?
> > > >
> > > >  Hi Viktor,
> > > >
> > > > Doing get-merge-set is buggy, though.  If someone else does get-merge-
> > > > set at the same time as you, you might overwrite that person's
> > > > changes, or vice versa.  So I really don't thin

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-17 Thread Colin McCabe

Thanks, Robert!

+1 (non-binding)

Colin


On Thu, May 17, 2018, at 14:15, Robert Yokota wrote:
> Hi Colin,
> 
> I've changed the KIP to have a composite object returned from get().  It's
> probably the most straightforward option.  Please let me know if you have
> any other concerns.
> 
> Thanks,
> Robert
> 
> On Thu, May 17, 2018 at 11:44 AM, Robert Yokota  wrote:
> 
> >
> >
> > Hi Colin,
> >
> > My last response was not that clear, so let me back up and explain a bit
> > more.
> >
> > Some secret managers, such as Vault (and maybe Keywhiz) have the notion of
> > a lease duration or a TTL for a path.  Every path can have a different
> > TTL.  This is period after which the value of the keys at the given path
> > may be invalid.  It can be used to indicate a rotation will be done.  In
> > the cause of the Vault integration with AWS, Vault will actually delete the
> > secrets from AWS at the moment the TTL expires.  A TTL could be used by
> > other ConfigProviders, such as a FileConfigProvider, to indicate that all
> > the secrets at a given path (file), will be rotated on a regular basis.
> >
> > I would like to expose the TTL in the APIs somewhere.  The TTL can be made
> > available at the time get() is called.  Connect already has a built in
> > ScheduledExecutor, so Connect can just use the TTL to schedule a Connector
> > restart.  Originally, I had exposed the TTL in a ConfigContext interface
> > passed to the get() method.  To reduce the number of APIs, I placed it on
> > the onChange() method.  This means at the time of get(), onChange() would
> > be called with a TTL.  The Connector's implementation of the callback would
> > use onChange() with the TTL to schedule a restart.
> >
> > If you think this is overloading onChange() too much, I could add the
> > ConfigContext back to get():
> >
> >
> > Map get(ConfigContext ctx, String path);
> >
> > public interface ConfigContext {
> >
> > void willExpire(String path, long ttl);
> >
> > }
> >
> >
> >
> > or I could separate out the TTL method in the callback:
> >
> >
> > public interface ConfigChangeCallback {
> >
> > void willExpire(String path, long ttl);
> >
> > void onChange(String path, Map values);
> > }
> >
> >
> >
> > Or we could return a composite object from get():
> >
> > ConfigData get(String path);
> >
> > public class ConfigData {
> >
> >   Map data;
> >   long ttl;
> >
> > }
> >
> >
> > Do you have a preference Colin?
> >
> > Thanks,
> > Robert
> >
> >
> > On Thu, May 17, 2018 at 9:27 AM, Colin McCabe  wrote:
> >
> >> Hi Robert,
> >>
> >> Hmm.  I thought that if you're using ConfigChangeCallback, you are
> >> relying on the ConfigProvider to make a callback to you when the
> >> configuration has changed.  So isn't that always the "push model" (where
> >> the ConfigProvider pushes changes to Connect).  If you want the "pull
> >> model" where you initiate updates, you can simply call ConfigProvider#get
> >> directly, right?
> >>
> >> The actual implementation of ConfigProvider subclasses will depend on the
> >> type of configuration storage mechanism on the backend.  In the case of
> >> Vault, it sounds like we need to have something like a ScheduledExecutor
> >> which re-fetches keys after a certain amount of time.
> >>
> >> As an aside, what does a "lease duration" mean for a configuration key?
> >> Does that mean Vault will reject changes to the configuration key if I try
> >> to make them within the lease duration?  Or is this like a period after
> >> which a password is automatically rotated?
> >>
> >> On Wed, May 16, 2018, at 22:25, Robert Yokota wrote:
> >> > Hi Colin,
> >> >
> >> > > With regard to delayMs, can’t we just restart the
> >> > > Connector when the keys are actually changed?
> >> >
> >> > Currently the VaultConfigProvider does not find out when values for keys
> >> > have changed.  You could do this with a poll model (with a
> >> > background thread in the ConfigProvider), but since for each key-value
> >> > pair, Vault provides a lease duration stating exactly when a value for a
> >> > key will change in the future, this is an alternative model of just
> >> passing
> >> > the lease

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-05-18 Thread Colin McCabe

Hi Viktor,

Thanks, this looks good.

The boolean should default to false if not set, to ensure that existing clients 
continue to work as-is, right?  Might be good to add a note specifying that.

+1 (non-binding)

best,
Colin

On Fri, May 18, 2018, at 08:16, Viktor Somogyi wrote:
> Updated KIP-248:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-248+-+Create+New+ConfigCommand+That+Uses+The+New+AdminClient
> 
> I'd like to ask project members, committers and contributors to vote
> as this would be a useful improvement in Kafka.
> 
> Sections changed:
> - Public interfaces: added the bin/scram-credentials.sh command that
> we discussed with Colin.
> - Wire format types: removed AlterOperations. As discussed with Colin,
> we don't actually need this: we should behave in an incremental way in
> AlterQuotas. For AlterConfig we'll implement this behavior with an
> extra flag on the protocol (and incrementing the version).
> - AlterQuotas protocol: removed AlterOperations. Added some more
> description to the behavior of the protocol. Removing quotas will
> happen by sending a NaN instead of the AlterOperations. (Since IEEE
> 754 covers NaNs and it is not a valid config for any quota, I think it
> is a good notation.)
> - SCRAM: so it will be done by the scram-credentials command that uses
> direct zookeeper connection. I think further modes, like doing it
> through the broker is not necessary. The idea here is that zookeeper
> in this case acts as a credentials store. This should be decoupled
> from the broker as we manage broker credentials as well. The new
> command acts as a client to the store.
> - AlterConfigs will have an incremental_update flag as discussed. By
> default it is false to provide the backward compatible behavior. When
> it is true it will merge the configs with what's there in the node.
> Deletion in incremental mode is done by sending an empty string as
> config value.
> - Other compatibility changes: this KIP doesn't scope listing multiple
> users and client's quotas. As per a conversation with Rajini, it is
> not a common use case and we can add it back later if it is needed. If
> this functionality is needed, the old code should be still available
> through run-kafka-class. (Removed the USE_OLD_KAFKA_CONFIG_COMMAND as
> it doesn't make sense anymore.)
> 
> On Fri, May 18, 2018 at 12:33 PM, Viktor Somogyi
>  wrote:
> > Ok, ignore my previous mail (except the last sentence), gmail didn't
> > update me about your last email :/.
> >
> >> I think we should probably just create a flag for alterConfigs which marks 
> >> it as incremental, like we discussed earlier, and do this as a compatible 
> >> change that is needed for the shell command.
> >
> > Alright, I missed that about the sensitive configs too, so in this
> > case I can agree with this. I'll update the KIP this afternoon and
> > update this thread.
> > Thanks again for your contribution.
> >
> > Viktor
> >
> > On Fri, May 18, 2018 at 2:34 AM, Colin McCabe  wrote:
> >> Actually, I just realized that this won't work.  The AlterConfigs API is 
> >> kind of broken right now.  DescribeConfigs won't return the "sensitive" 
> >> configurations like passwords.  So doing describe + edit + alter will wipe 
> >> out all sensitive configs. :(
> >>
> >> I think we should probably just create a flag for alterConfigs which marks 
> >> it as incremental, like we discussed earlier, and do this as a compatible 
> >> change that is needed for the shell command.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Thu, May 17, 2018, at 09:32, Colin McCabe wrote:
> >>> Hi Viktor,
> >>>
> >>> Since the KIP freeze is coming up really soon, maybe we should just drop
> >>> the section about changes to AlterConfigs from KIP-248.  We don't really
> >>> need it here, since ConfigCommand can use AlterConfigs as-is.
> >>>
> >>> We can pick up the discussion about improving AlterConfigs in a future 
> >>> KIP.
> >>>
> >>> cheers,
> >>> Colin
> >>>
> >>> On Wed, May 16, 2018, at 22:06, Colin McCabe wrote:
> >>> > Hi Viktor,
> >>> >
> >>> > The shell command isn’t that easy to integrate into applications.
> >>> > AdminClient will get integrated  into a lot more stuff, which
> >>> > increases the potential for conflicts.  I would argue that we should
> >>> > fix this soon.
> >>> > If we

Re: [VOTE] KIP-290: Support for wildcard suffixed ACLs

2018-05-21 Thread Colin McCabe

Hmm, do we still need one more binding +1, or did I misread the vote thread?

+1 (non-binding) from me.

As I posted in the other thread, I think it would make sense to name the new ZK 
hierarchy /kafka-prefix-acls or similar, to reflect the fact that they are ACLs 
that match by a name prefix.  They're not really "wildcards"

It would also be nice to have a MatchType enum in the Resource class like 
LITERAL (for existing ACLs) and PREFIX (for the new ACL type).  This would put 
us in a good position to implement new ACL types in the future if the desire 
arises, and also clearly explain how each type works.

best,
Colin

On Mon, May 21, 2018, at 13:40, Piyush Vijay wrote:
> Thanks for the +1s. I'll send out a PR shortly.
> 
> Thanks
> 
> 
> Piyush Vijay
> 
> On Mon, May 21, 2018 at 8:59 AM, Guozhang Wang  wrote:
> 
> > Thanks for the KIP, +1 from me (binding).
> >
> >
> > Guozhang
> >
> > On Mon, May 21, 2018 at 5:22 AM, Damian Guy  wrote:
> >
> > > +1 (binding)
> > >
> > > On Sat, 19 May 2018 at 03:51 Piyush Vijay 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I would like to start a vote for
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 290%3A+Support+for+wildcard+suffixed+ACLs
> > > > .
> > > >
> > > > The KIP proposes a way to support wildcard-suffixed resource names in
> > > Kafka
> > > > ACLs.
> > > >
> > > > The main challenge was to support it in a backward compatible way
> > because
> > > > resources like consumer groups don't have defined naming convention and
> > > can
> > > > have '*' in their names.
> > > >
> > > > Please take a look.
> > > >
> > > > Thanks
> > > >
> > > > Piyush Vijay
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-21 Thread Colin McCabe

Oops, will post on the vote thread.

best,
Colin

On Mon, May 21, 2018, at 21:15, Colin McCabe wrote:
> On Mon, May 21, 2018, at 04:53, Andy Coates wrote:
> > Hey Piyush,
> > 
> > Thanks for the updated KIP! Couple of minor points from me:
> > 
> > When storing wildcard-suffixed Acls in ZK, drop the asterisk of the end for
> > the path, e.g. change "*/kafka-wildcard-acl/Topic/teamA*" * to "*/*
> > *kafka-wildcard-acl**/Topic/**teamA"*. This reduces error conditions, i.e.
> > this is a place for storing wildcard-suffixed Acls, so it implicitly ends
> > in an asterisk. If you include the asterisk in the path then you need to
> > validate each entry, when reading, ends with an asterisk, and do something
> > if they don't. If you don't include the training asterisk then there is
> > nothing to validate and you can treat the prefix as a literal, (i.e. no
> > escaping needed).  TBH I'd probably drop the asterisk from the in-memory
> > representation as well, for the same reason.
> 
> Hi Andy,
> 
> I agree.  If everything in ZK under /kafka-wildcard-acl/ is a prefix 
> ACL, there is no need to include the star at the end.  And really, it 
> should be called something like /kafka-prefix-acl/, since it's only 
> vaguely related to the idea of wildcards.
> 
> > 
> > Rather than creating an enum to indicate the type of a resource, you could
> > instead use polymorphism, e.g. make Resource an interface and have two
> > implementations: LiteralResource and WildcardSuffixedResource.  This is
> > also extensible, but may also allow for a cleaner implementation.
> 
> Since Resource is a concrete class now, we can't make it an interface 
> without breaking API compatibility.
> 
> Even if it were possible to do compatibly, I would argue it's a bad 
> idea.  If we need to add another bit of state like case insensitivity, 
> we don't want to have LiteralCaseInsensistiveResource, 
> WildcardSuffixedCaseInsensitiveResource, etc. etc.  You need 2^n 
> subclasses classes to represent N bits of state.
> 
> I would argue that there should be a field in Resource like NameType 
> which can be LITERAL or PREFIX.  That leaves us in a good position when 
> someone inevitably comes up with a new NameType.
> 
> Does this still have a chance to get in, or has the KIP window closed?  
> I would argue with one or two minor changes it's ready to go.  Pretty 
> much all of the compatibility problems are solved with the separate ZK 
> hierarchy.
> 
> best,
> Colin
> 
> > 
> > Andy
> > 
> > On 21 May 2018 at 01:58, Rajini Sivaram  wrote:
> > 
> > > Hi Piyush, Thanks for the KIP!
> > >
> > > +1 (binding)
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Sun, May 20, 2018 at 2:53 PM, Andy Coates  wrote:
> > >
> > > > Awesome last minute effort Piyush.
> > > >
> > > > Really appreciate your time and input,
> > > >
> > > > Andy
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On 19 May 2018, at 03:43, Piyush Vijay  wrote:
> > > > >
> > > > > Updated the KIP.
> > > > >
> > > > > 1. New enum field 'ResourceNameType' in Resource and ResourceFilter
> > > > classes.
> > > > > 2. modify getAcls() and rely on ResourceNameType' field in Resource to
> > > > > return either exact matches or all matches based on wildcard-suffix.
> > > > > 3. CLI changes to identify if resource name is literal or
> > > wildcard-suffix
> > > > > 4. Escaping doesn't work and isn't required if we're keeping a 
> > > > > separate
> > > > > path on ZK (kafka-wildcard-acl) to store wildcard-suffix ACLs.
> > > > > 5. New API keys for Create / Delete / Describe Acls request with a new
> > > > > field in schemas for 'ResourceNameType'.
> > > > >
> > > > > Looks ready to me for the vote, will start voting thread now. Thanks
> > > > > everyone for the valuable feedback.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Piyush Vijay
> > > > >
> > > > >
> > > > > Piyush Vijay
> > > > >
> > > > >> On Fri, May 18, 2018 at 6:07 PM, Andy Coates 
> > > wrote:
> > > > >>
> > > > >> Hi Piyush,
> > > > >>

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-21 Thread Colin McCabe

On Mon, May 21, 2018, at 04:53, Andy Coates wrote:
> Hey Piyush,
> 
> Thanks for the updated KIP! Couple of minor points from me:
> 
> When storing wildcard-suffixed Acls in ZK, drop the asterisk of the end for
> the path, e.g. change "*/kafka-wildcard-acl/Topic/teamA*" * to "*/*
> *kafka-wildcard-acl**/Topic/**teamA"*. This reduces error conditions, i.e.
> this is a place for storing wildcard-suffixed Acls, so it implicitly ends
> in an asterisk. If you include the asterisk in the path then you need to
> validate each entry, when reading, ends with an asterisk, and do something
> if they don't. If you don't include the training asterisk then there is
> nothing to validate and you can treat the prefix as a literal, (i.e. no
> escaping needed).  TBH I'd probably drop the asterisk from the in-memory
> representation as well, for the same reason.

Hi Andy,

I agree.  If everything in ZK under /kafka-wildcard-acl/ is a prefix ACL, there 
is no need to include the star at the end.  And really, it should be called 
something like /kafka-prefix-acl/, since it's only vaguely related to the idea 
of wildcards.

> 
> Rather than creating an enum to indicate the type of a resource, you could
> instead use polymorphism, e.g. make Resource an interface and have two
> implementations: LiteralResource and WildcardSuffixedResource.  This is
> also extensible, but may also allow for a cleaner implementation.

Since Resource is a concrete class now, we can't make it an interface without 
breaking API compatibility.

Even if it were possible to do compatibly, I would argue it's a bad idea.  If 
we need to add another bit of state like case insensitivity, we don't want to 
have LiteralCaseInsensistiveResource, WildcardSuffixedCaseInsensitiveResource, 
etc. etc.  You need 2^n subclasses classes to represent N bits of state.

I would argue that there should be a field in Resource like NameType which can 
be LITERAL or PREFIX.  That leaves us in a good position when someone 
inevitably comes up with a new NameType.

Does this still have a chance to get in, or has the KIP window closed?  I would 
argue with one or two minor changes it's ready to go.  Pretty much all of the 
compatibility problems are solved with the separate ZK hierarchy.

best,
Colin

> 
> Andy
> 
> On 21 May 2018 at 01:58, Rajini Sivaram  wrote:
> 
> > Hi Piyush, Thanks for the KIP!
> >
> > +1 (binding)
> >
> > Regards,
> >
> > Rajini
> >
> > On Sun, May 20, 2018 at 2:53 PM, Andy Coates  wrote:
> >
> > > Awesome last minute effort Piyush.
> > >
> > > Really appreciate your time and input,
> > >
> > > Andy
> > >
> > > Sent from my iPhone
> > >
> > > > On 19 May 2018, at 03:43, Piyush Vijay  wrote:
> > > >
> > > > Updated the KIP.
> > > >
> > > > 1. New enum field 'ResourceNameType' in Resource and ResourceFilter
> > > classes.
> > > > 2. modify getAcls() and rely on ResourceNameType' field in Resource to
> > > > return either exact matches or all matches based on wildcard-suffix.
> > > > 3. CLI changes to identify if resource name is literal or
> > wildcard-suffix
> > > > 4. Escaping doesn't work and isn't required if we're keeping a separate
> > > > path on ZK (kafka-wildcard-acl) to store wildcard-suffix ACLs.
> > > > 5. New API keys for Create / Delete / Describe Acls request with a new
> > > > field in schemas for 'ResourceNameType'.
> > > >
> > > > Looks ready to me for the vote, will start voting thread now. Thanks
> > > > everyone for the valuable feedback.
> > > >
> > > >
> > > >
> > > >
> > > > Piyush Vijay
> > > >
> > > >
> > > > Piyush Vijay
> > > >
> > > >> On Fri, May 18, 2018 at 6:07 PM, Andy Coates 
> > wrote:
> > > >>
> > > >> Hi Piyush,
> > > >>
> > > >> We're fast approaching the KIP deadline. Are you actively working on
> > > this?
> > > >> If you're not I can take over.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Andy
> > > >>
> > > >>> On 18 May 2018 at 14:25, Andy Coates  wrote:
> > > >>>
> > > >>> OK I've read it now.
> > > >>>
> > > >>> 1. I see you have an example:
> > > >>>> For example: If I want to fetch all ACLs that match ’topicA*’

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-06-05 Thread Colin McCabe

As Jason noted earlier, request.timeout.ms controls something different: the 
amount of time we're willing to wait for an RPC to complete.

Empirically, RPCs sometimes hang for a long time.  If the API timeout is the 
same as the RPC timeout, we are not robust against this failure condition.  The 
whole call fails rather than trying another server or a new socket.

In order to fix this, we must make the API timeout longer than the RPC timeout.

However, we have a lot of users who have been trained to use request.timeout.ms 
to make their clients time out earlier.  If we introduce a new config to do 
this instead, it's kind of a breaking change for them.  Perhaps we should go 
the other direction and create a new configuration like rpc.timeout.ms to do 
what request.timeout.ms is doing now.  Then request.timeout.ms can become what 
users already think it is: the timeout for their API calls.

best,
Colin


On Tue, Jun 5, 2018, at 15:29, Ted Yu wrote:
> bq. we were already doing with request.timeout.ms
> 
> I would vote for using existing config.
> 
> Any new config parameter needs to go thru long process of digestion:
> documentation, etc in order for users to understand and familiarize.
> 
> The existing config would have lower mismatch of impedance.
> 
> Cheers
> 
> On Tue, Jun 5, 2018 at 3:17 PM, Jason Gustafson  wrote:
> 
> > Thanks for the comments. I'm not sure I understand why we want to avoid the
> > term "timeout." Semantically, that's what it is. If we don't want another
> > timeout config, we could avoid it and hard-code a reasonable default or try
> > to wrap the behavior into one of the other timeouts (which is sort of what
> > we were already doing with request.timeout.ms). But I'm not too sure
> > calling it something else addresses the problem.
> >
> > -Jason
> >
> > On Tue, Jun 5, 2018 at 2:59 PM, Dhruvil Shah  wrote:
> >
> > > I agree that using `default.timeout.ms` could cause confusion since we
> > > already have other timeout configurations in the consumer.
> > >
> > > +1 for using `default.block.ms`.
> > >
> > > Thanks,
> > > Dhruvil
> > >
> > > On Tue, Jun 5, 2018 at 11:48 AM, Bill Bejeck  wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > At first, I thought the same name between the producer and the consumer
> > > was
> > > > ideal.
> > > >
> > > > But your comment makes me realize consistent names with different
> > > semantics
> > > > is even more confusing.
> > > >
> > > > I'm +1 for not using `max.block.ms`.  I like Guozhang's suggestion of
> > `
> > > > default.block.ms` for the name.
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > > On Tue, Jun 5, 2018 at 1:33 PM, Guozhang Wang 
> > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > Yeah I agree that "max.block.ms" makes people thinking of the
> > > producer's
> > > > > config with the same name, but their semantics are different.
> > > > >
> > > > > On the other hand, I'm a bit concerned with the reusing of the term
> > > > > `timeout` as we already have `session.timeout.ms` and `
> > > > request.timeout.ms`
> > > > > in ConsumerConfig.. How about using the name `default.api.block.ms`
> > or
> > > > > simply `default.block.ms`?
> > > > >
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Tue, Jun 5, 2018 at 8:57 AM, Jason Gustafson 
> > > > > wrote:
> > > > >
> > > > > > Hey All,
> > > > > >
> > > > > > One more minor follow-up. As I was reviewing the change mentioned
> > > > above,
> > > > > I
> > > > > > felt the name `max.block.ms` was a little bit misleading since it
> > > only
> > > > > > applies to methods which do not have an explicit timeout. A clearer
> > > > name
> > > > > > given its usage might be `default.timeout.ms`. It is the default
> > > > timeout
> > > > > > for any blocking API which does not have a timeout. I'm leaning
> > > toward
> > > > > > using this name since the current one seems likely to cause
> > > confusion.
> > > > > Any
> > > > > > t

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-06-05 Thread Colin McCabe

I don't think it can be fixed.  The RPC duration is something that you might 
reasonably want to tune.  For example, if it's too low, you might not be able 
to make progress at all on a heavily loaded server.

We could probably come up with a good default, however.  rpc.timeout.ms could 
be set to something like
max(1000, 0.5 * request.timeout.ms)

best,
Colin


On Tue, Jun 5, 2018, at 16:21, Ted Yu wrote:
> bq. we must make the API timeout longer than the RPC timeout
> 
> I agree with the above.
> 
> How about adding a fixed duration on top of request.timeout.ms ?
> 
> Thanks
> 
> On Tue, Jun 5, 2018 at 4:18 PM, Colin McCabe  wrote:
> 
> > As Jason noted earlier, request.timeout.ms controls something different:
> > the amount of time we're willing to wait for an RPC to complete.
> >
> > Empirically, RPCs sometimes hang for a long time.  If the API timeout is
> > the same as the RPC timeout, we are not robust against this failure
> > condition.  The whole call fails rather than trying another server or a new
> > socket.
> >
> > In order to fix this, we must make the API timeout longer than the RPC
> > timeout.
> >
> > However, we have a lot of users who have been trained to use
> > request.timeout.ms to make their clients time out earlier.  If we
> > introduce a new config to do this instead, it's kind of a breaking change
> > for them.  Perhaps we should go the other direction and create a new
> > configuration like rpc.timeout.ms to do what request.timeout.ms is doing
> > now.  Then request.timeout.ms can become what users already think it is:
> > the timeout for their API calls.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jun 5, 2018, at 15:29, Ted Yu wrote:
> > > bq. we were already doing with request.timeout.ms
> > >
> > > I would vote for using existing config.
> > >
> > > Any new config parameter needs to go thru long process of digestion:
> > > documentation, etc in order for users to understand and familiarize.
> > >
> > > The existing config would have lower mismatch of impedance.
> > >
> > > Cheers
> > >
> > > On Tue, Jun 5, 2018 at 3:17 PM, Jason Gustafson 
> > wrote:
> > >
> > > > Thanks for the comments. I'm not sure I understand why we want to
> > avoid the
> > > > term "timeout." Semantically, that's what it is. If we don't want
> > another
> > > > timeout config, we could avoid it and hard-code a reasonable default
> > or try
> > > > to wrap the behavior into one of the other timeouts (which is sort of
> > what
> > > > we were already doing with request.timeout.ms). But I'm not too sure
> > > > calling it something else addresses the problem.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Jun 5, 2018 at 2:59 PM, Dhruvil Shah 
> > wrote:
> > > >
> > > > > I agree that using `default.timeout.ms` could cause confusion since
> > we
> > > > > already have other timeout configurations in the consumer.
> > > > >
> > > > > +1 for using `default.block.ms`.
> > > > >
> > > > > Thanks,
> > > > > Dhruvil
> > > > >
> > > > > On Tue, Jun 5, 2018 at 11:48 AM, Bill Bejeck 
> > wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > At first, I thought the same name between the producer and the
> > consumer
> > > > > was
> > > > > > ideal.
> > > > > >
> > > > > > But your comment makes me realize consistent names with different
> > > > > semantics
> > > > > > is even more confusing.
> > > > > >
> > > > > > I'm +1 for not using `max.block.ms`.  I like Guozhang's
> > suggestion of
> > > > `
> > > > > > default.block.ms` for the name.
> > > > > >
> > > > > > Thanks,
> > > > > > Bill
> > > > > >
> > > > > > On Tue, Jun 5, 2018 at 1:33 PM, Guozhang Wang 
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jason,
> > > > > > >
> > > > > > > Yeah I agree that "max.block.ms" makes people thinking of the
> > > > > producer's
> > > > > > > config with the same name, but their

Re: Someone to review KAFKA-6919, one line change for faulty documentation

2018-06-05 Thread Colin McCabe

Thanks, Koen.  I was on vacation, so I missed this originally.  But I'll review 
it now.

cheers,
Colin


On Fri, Jun 1, 2018, at 13:02, Koen De Groote wrote:
> Greetings,
> 
> Poking for someone to have a quick look at this, It's a one-line change. I
> noticed the documentation about trogdor was pointing to a non-existing
> folder.
> 
> Ticket: https://issues.apache.org/jira/browse/KAFKA-6919
> 
> PR: https://github.com/apache/kafka/pull/5040
> 
> Thanks.

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-06-05 Thread Colin McCabe

On Tue, Jun 5, 2018, at 16:35, Ted Yu wrote:
> bq. could probably come up with a good default
> 
> That's the difficult part.
> 
> bq. max(1000, 0.5 * request.timeout.ms)
> 
> Looking at some existing samples:
> In tests/kafkatest/tests/connect/templates/connect-distributed.properties ,
> we have:
>   request.timeout.ms=3
> 
> Isn't the above formula putting an upper bound 15000 for the RPC timeout ?

Correct.  It would put a 15 second default on the RPC timeout in this case.  If 
that's too short, the user has the option to change it.

If we feel that 15 seconds is too short, we could put a floor of 30 seconds or 
whatever on the RPC timeout, instead of 1 second.

> 
> By fixed duration, I meant something like
>   request.timeout.ms + 15000

The RPC timeout should be shorter than the request timeout, so that we get 
multiple tries if the RPC hangs due to network issues.  It should not be longer.

best,
Colin

> 
> Cheers
> 
> On Tue, Jun 5, 2018 at 4:27 PM, Colin McCabe  wrote:
> 
> > I don't think it can be fixed.  The RPC duration is something that you
> > might reasonably want to tune.  For example, if it's too low, you might not
> > be able to make progress at all on a heavily loaded server.
> >
> > We could probably come up with a good default, however.  rpc.timeout.ms
> > could be set to something like
> > max(1000, 0.5 * request.timeout.ms)
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jun 5, 2018, at 16:21, Ted Yu wrote:
> > > bq. we must make the API timeout longer than the RPC timeout
> > >
> > > I agree with the above.
> > >
> > > How about adding a fixed duration on top of request.timeout.ms ?
> > >
> > > Thanks
> > >
> > > On Tue, Jun 5, 2018 at 4:18 PM, Colin McCabe  wrote:
> > >
> > > > As Jason noted earlier, request.timeout.ms controls something
> > different:
> > > > the amount of time we're willing to wait for an RPC to complete.
> > > >
> > > > Empirically, RPCs sometimes hang for a long time.  If the API timeout
> > is
> > > > the same as the RPC timeout, we are not robust against this failure
> > > > condition.  The whole call fails rather than trying another server or
> > a new
> > > > socket.
> > > >
> > > > In order to fix this, we must make the API timeout longer than the RPC
> > > > timeout.
> > > >
> > > > However, we have a lot of users who have been trained to use
> > > > request.timeout.ms to make their clients time out earlier.  If we
> > > > introduce a new config to do this instead, it's kind of a breaking
> > change
> > > > for them.  Perhaps we should go the other direction and create a new
> > > > configuration like rpc.timeout.ms to do what request.timeout.ms is
> > doing
> > > > now.  Then request.timeout.ms can become what users already think it
> > is:
> > > > the timeout for their API calls.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Tue, Jun 5, 2018, at 15:29, Ted Yu wrote:
> > > > > bq. we were already doing with request.timeout.ms
> > > > >
> > > > > I would vote for using existing config.
> > > > >
> > > > > Any new config parameter needs to go thru long process of digestion:
> > > > > documentation, etc in order for users to understand and familiarize.
> > > > >
> > > > > The existing config would have lower mismatch of impedance.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Jun 5, 2018 at 3:17 PM, Jason Gustafson 
> > > > wrote:
> > > > >
> > > > > > Thanks for the comments. I'm not sure I understand why we want to
> > > > avoid the
> > > > > > term "timeout." Semantically, that's what it is. If we don't want
> > > > another
> > > > > > timeout config, we could avoid it and hard-code a reasonable
> > default
> > > > or try
> > > > > > to wrap the behavior into one of the other timeouts (which is sort
> > of
> > > > what
> > > > > > we were already doing with request.timeout.ms). But I'm not too
> > sure
> > > > > > calling it something else addresses the problem.
> > > > > >
> > > > > > -Jason
> > >

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-06-08 Thread Colin McCabe

On Wed, Jun 6, 2018, at 13:10, Guozhang Wang wrote:
> The reason that I'm hesitant to use the term "timeout" is that it's being
> over-used for multiple semantics: request RPC timeout, consumer session
> heartbeat liveness "timeout", and API blocking timeout. We can argue that
> in English both of them are indeed called a "timeout" value, but personally
> I'm afraid for normal users having the same word `timeout` would be
> confusing, and hence I'm proposing for using a slight different term.

Hmm.  I can see why you want to have a different-sounding name from the 
existing timeouts.  However, I think it could be less clear to omit the word 
timeout.  If your operation times out, and you get a TimeoutException, what 
configuration do you raise?  The timeout.  If the configuration name doesn't 
tell you that it's a timeout, it's harder to understand what needs to be 
changed.

For example, if "group.min.session.timeout.ms" was called something like 
"group.min.session.block.ms" or "group.min.session.heartbeat.ms", it would not 
be as clear.  

> Comparing with adding a new config, I'm actually more concerned about
> leveraging the request.timeout value for a default blocking timeout, since
> the default value is hard to decide, since for different blocking calls, it
> may have different rpc round trips behind the scene, so simply setting it
> as request.timeout + a delta may not be always good enough.

Yes, I agree that we need a new configuration key.  I don't think we should try 
to hard-code this.

I think we should just bite the bullet and create a new configuration key like 
"default.api.timeout.ms" that sets the default timeout for API calls.  The hard 
part is adding the new configuration in a way that doesn't disrupt existing 
configurations.

There are at least a few cases to worry about:

1. Someone uses the default (pretty long) timeouts for everything.
2. Someone has configured a short request.timeout.ms, in an effort to see 
failures more quickly
3. Someone has configured a very long (or maybe infinite) request.timeout.ms

Case #2 is probably the one which is hardest to support well.  We could 
probably do it with logic like this:

A. If default.api.timeout.ms is explicitly set, we use that value.  otherwise...
B. If request.timeout.ms is longer than 2 minutes, we set 
default.api.timeout.ms to request.timeout.ms + 1500.  otherwise...
 we set default.api.timeout.ms to request.timeout.ms

best,
Colin

> 
> 
> Guozhang
> 
> 
> On Tue, Jun 5, 2018 at 5:18 PM, Ted Yu  wrote:
> 
> > I see where the 0.5 in your previous response came about.
> >
> > The reason I wrote 'request.timeout.ms + 15000' was that I treat this
> > value
> > in place of the default.block.ms
> > According to your earlier description:
> >
> > bq. request.timeout.ms controls something different: the amount of time
> > we're willing to wait for an RPC to complete.
> >
> > Basically we're in agreement. It is just that figuring out good default is
> > non-trivial.
> >
> > On Tue, Jun 5, 2018 at 4:44 PM, Colin McCabe  wrote:
> >
> > > On Tue, Jun 5, 2018, at 16:35, Ted Yu wrote:
> > > > bq. could probably come up with a good default
> > > >
> > > > That's the difficult part.
> > > >
> > > > bq. max(1000, 0.5 * request.timeout.ms)
> > > >
> > > > Looking at some existing samples:
> > > > In tests/kafkatest/tests/connect/templates/connect-distributed.
> > properties
> > > ,
> > > > we have:
> > > >   request.timeout.ms=3
> > > >
> > > > Isn't the above formula putting an upper bound 15000 for the RPC
> > timeout
> > > ?
> > >
> > > Correct.  It would put a 15 second default on the RPC timeout in this
> > > case.  If that's too short, the user has the option to change it.
> > >
> > > If we feel that 15 seconds is too short, we could put a floor of 30
> > > seconds or whatever on the RPC timeout, instead of 1 second.
> > >
> > > >
> > > > By fixed duration, I meant something like
> > > >   request.timeout.ms + 15000
> > >
> > > The RPC timeout should be shorter than the request timeout, so that we
> > get
> > > multiple tries if the RPC hangs due to network issues.  It should not be
> > > longer.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Jun 5, 2018 at 4:27 PM, Colin McCabe 
> > wrote:
> &

Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-06-11 Thread Colin McCabe

Sounds good.  Thanks, Konstantin.

Colin


On Mon, Jun 11, 2018, at 13:41, Rajini Sivaram wrote:
> Hi Konstantine,
> 
> Sounds reasonable to me too.
> 
> Regards,
> 
> Rajini
> 
> On Mon, Jun 11, 2018 at 7:55 PM, Robert Yokota  wrote:
> 
> > Hi Konstantine,
> >
> > Sounds reasonable!
> >
> > Thanks,
> > Robert
> >
> > On Mon, Jun 11, 2018 at 11:49 AM, Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> > > Hi everyone, after fixing an issue with a regular expression in Connect's
> > > class loading isolation of the new component type ConfigProvider here:
> > >
> > > https://github.com/apache/kafka/pull/5177
> > >
> > > I noticed that the new interface ConfigProvider, along with its first
> > > implementation FileConfigProvider, have been placed in the package:
> > >
> > > org.apache.kafka.common.config
> > >
> > > This specific package is mentioned in KIP-297 is a few places, but not in
> > > any code snippets. I'd like to suggest moving the interface and any
> > current
> > > of future implementation classes in a new package named:
> > >
> > > org.apache.kafka.common.config.provider
> > >
> > > and update the KIP document accordingly.
> > >
> > > This seems to make sense in general. But, specifically, in Connect it is
> > > desired since we treat ConfigProvider implementations as Connect
> > components
> > > that are loaded in isolation. Having a package for config providers will
> > > allow us to avoid making any assumptions with respect to the name of a
> > > class that implements `ConfigProvider` and is included in Apache Kafka.
> > It
> > > will suffice for this class to reside in the package
> > > org.apache.kafka.common.config.provider.
> > >
> > > Let me know if this is a reasonable request and if you agree on amending
> > > the KIP description.
> > >
> > > - Konstantine
> > >
> > >
> > >
> > > On Wed, May 16, 2018 at 10:33 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the update, Robert. Looks good to me.
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > > On Wed, May 16, 2018 at 4:43 PM, Robert Yokota 
> > > wrote:
> > > >
> > > > > Hi Rajini,
> > > > >
> > > > > Thanks for the excellent feedback!
> > > > >
> > > > > I've made the API changes that you've requested in the KIP.
> > > > >
> > > > >
> > > > > > 1. Are we expecting one provider instance with different contexts
> > > > > > provided to `ConfigProvider.get()`? If we created a different
> > > provider
> > > > > > instance for each context, we could deal with scheduling reloads in
> > > the
> > > > > > provider implementation?
> > > > >
> > > > > Yes, there would be one provider instance.  I've collapsed the
> > > > > ConfigContext and the ConfigChangeCallback by adding a parameter
> > > delayMs
> > > > to
> > > > > indicate when the change will happen.  When a particular
> > ConfigProvider
> > > > > retrieves a lease duration along with a key, it can either 1)
> > > schedule a
> > > > > background thread to push out the change when it happens (at which
> > time
> > > > the
> > > > > delayMs will be 0), or invoke the callback immediately with the lease
> > > > > duration set as delayMs (of course, in this case the values for the
> > > keys
> > > > > will be the old values).  A ConfProvider could be parameterized to do
> > > one
> > > > > or the other.
> > > > >
> > > > >
> > > > > > 2. Couldn't ConfigData  be an interface that just returns a map of
> > > > > > key-value pairs. Providers that return metadata could extend it to
> > > > > provide
> > > > > > metadata in a meaningful format instead of Map.
> > > > >
> > > > > I've replaced ConfigData with Map as you suggested.
> > > > >
> > > > >
> > > > > > 3. For ZK, we would use ConfigProvider.get() without `keys` to get
> > > all
> > > >

Re: SASL Unit test failing

2018-06-26 Thread Colin McCabe

On trunk, testMultipleServerMechanisms failed for me, as well as 
testAuthenticateCallbackHandlerMechanisms and testMechanismPluggability.

org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest > 
testMultipleServerMechanisms FAILED
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.kafka.common.network.NetworkTestUtils.waitForChannelReady(NetworkTestUtils.java:79)
at 
org.apache.kafka.common.network.NetworkTestUtils.checkClientConnection(NetworkTestUtils.java:52)
at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testMultipleServerMechanisms(SaslAuthenticatorTest.java:281)

org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest > 
testAuthenticateCallbackHandlerMechanisms FAILED
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.kafka.common.network.NetworkTestUtils.waitForChannelReady(NetworkTestUtils.java:79)
at 
org.apache.kafka.common.network.NetworkTestUtils.checkClientConnection(NetworkTestUtils.java:52)
at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.createAndCheckClientConnection(SaslAuthenticatorTest.java:1475)
at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testAuthenticateCallbackHandlerMechanisms(SaslAuthenticatorTest.java:776)

org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest > 
testMechanismPluggability FAILED
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.kafka.common.network.NetworkTestUtils.waitForChannelReady(NetworkTestUtils.java:79)
at 
org.apache.kafka.common.network.NetworkTestUtils.checkClientConnection(NetworkTestUtils.java:52)
at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.createAndCheckClientConnection(SaslAuthenticatorTest.java:1475)
at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testMechanismPluggability(SaslAuthenticatorTest.java:257)

best,
Colin

On Mon, Jun 25, 2018, at 18:02, Ted Yu wrote:
> I ran the test on Linux as well.
> 
> cat /etc/redhat-release
> CentOS Linux release 7.2.1511 (Core)
> 
> Java version: 1.8.0_161, vendor: Oracle Corporation
> Java home: /jdk1.8.0_161/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-327.28.3.el7.x86_64", arch: "amd64",
> family: "unix"
> 
> On Mon, Jun 25, 2018 at 5:42 PM, Ted Yu  wrote:
> 
> > Here was the command I used:
> >
> > ./gradlew -Dtest.single=SaslAuthenticatorTest clients:test
> >
> > On Mon, Jun 25, 2018 at 5:39 PM, Ahmed A  wrote:
> >
> >> I ran test with -i option as follows - "./gradlew  -i test".  The same set
> >> of three tests failed.
> >>
> >> My environment:
> >> $ java -version
> >> java version "1.8.0_121"
> >> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> >> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> >>
> >> $ cat /etc/redhat-release
> >> Red Hat Enterprise Linux Workstation release 7.3 (Maipo)
> >> $ uname -a
> >> Linux  ahmed  3.10.0-514.36.5.el7.x86_64 #1 SMP Thu Dec 28 21:42:18 EST
> >> 2017 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >>
> >> Can you please let me know how I can run an individual unit test, what
> >> options do I provide?
> >>
> >>
> >> Thank you,
> >> Ahmed.
> >>
> >>
> >>
> >> On Mon, Jun 25, 2018 at 2:47 PM, Ted Yu  wrote:
> >>
> >> > I ran the test alone which passed.
> >> >
> >> > Can you include -i on the command line to see if there is some clue from
> >> > the output ?
> >> >
> >> > Here is my environment:
> >> >
> >> > Java version: 1.8.0_151, vendor: Oracle Corporation
> >> > Java home:
> >> > /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/jre
> >> > Default locale: en_US, platform encoding: UTF-8
> >> > OS name: "mac os x", version: "10.11.3", arch: "x86_64", family: "mac"
> >> >
> >> > FYI
> >> >
> >> > On Mon, Jun 25, 2018 at 12:59 PM, Ahmed A

Re: [VOTE] KIP-324: Add method to get metrics() in AdminClient

2018-06-26 Thread Colin McCabe

Can you add a little more explanation to the KIP for why you are adding this 
method?  Is it something streams needs, for example?  Will it help other 
applications that use admin client and want to expose metrics?

What are the thread-safety guarantees for the map which is returned?

best,
Colin

On Tue, Jun 26, 2018, at 11:29, Yishun Guan wrote:
> Hi All,
> 
> I am starting a vote on this KIP:
> 
> https://cwiki.apache.org/confluence/x/lQg0BQ
> 
> Thanks,
> Yishun

Re: [VOTE] KIP-324: Add method to get metrics() in AdminClient

2018-06-27 Thread Colin McCabe

On Tue, Jun 26, 2018, at 13:24, Yishun Guan wrote:
> Hi Colin,
> 
> I agree with what Guozhang's opinion that because all the other clients
> have it (producer, consumer..) and this will gain more visibility for those
> application that use admin client. (Now I added this sentence to the KIP)

I agree.  Thanks.

> Since this returns an unmodifiableMap(like all the other client's metrics()
> return), I assume this will be thread-safe, what do you think?

Please document that it is thread-safe.

thanks,
Colin

> 
> Thanks,
> Yishun
> 
> 
> On Tue, Jun 26, 2018 at 11:51 AM, Colin McCabe  wrote:
> 
> > Can you add a little more explanation to the KIP for why you are adding
> > this method?  Is it something streams needs, for example?  Will it help
> > other applications that use admin client and want to expose metrics?
> >
> > What are the thread-safety guarantees for the map which is returned?
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jun 26, 2018, at 11:29, Yishun Guan wrote:
> > > Hi All,
> > >
> > > I am starting a vote on this KIP:
> > >
> > > https://cwiki.apache.org/confluence/x/lQg0BQ
> > >
> > > Thanks,
> > > Yishun
> >

Re: [VOTE] KIP-324: Add method to get metrics() in AdminClient

2018-06-27 Thread Colin McCabe

P.S.  +1 (non-binding) once you add the info about it being thread-safe.

best,


On Wed, Jun 27, 2018, at 15:23, Colin McCabe wrote:
> On Tue, Jun 26, 2018, at 13:24, Yishun Guan wrote:
> > Hi Colin,
> > 
> > I agree with what Guozhang's opinion that because all the other clients
> > have it (producer, consumer..) and this will gain more visibility for those
> > application that use admin client. (Now I added this sentence to the KIP)
> 
> I agree.  Thanks.
> 
> > Since this returns an unmodifiableMap(like all the other client's metrics()
> > return), I assume this will be thread-safe, what do you think?
> 
> Please document that it is thread-safe.
> 
> thanks,
> Colin
> 
> > 
> > Thanks,
> > Yishun
> > 
> > 
> > On Tue, Jun 26, 2018 at 11:51 AM, Colin McCabe  wrote:
> > 
> > > Can you add a little more explanation to the KIP for why you are adding
> > > this method?  Is it something streams needs, for example?  Will it help
> > > other applications that use admin client and want to expose metrics?
> > >
> > > What are the thread-safety guarantees for the map which is returned?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Tue, Jun 26, 2018, at 11:29, Yishun Guan wrote:
> > > > Hi All,
> > > >
> > > > I am starting a vote on this KIP:
> > > >
> > > > https://cwiki.apache.org/confluence/x/lQg0BQ
> > > >
> > > > Thanks,
> > > > Yishun
> > >

Re: Kafka system tests contribution

2018-07-10 Thread Colin McCabe

Hi Andriy,

Try looking at the logs to see why the test failed.

best,
Colin

On Wed, May 16, 2018, at 07:08, Andriy Sorokhtey wrote:
> Hi,
> 
> Did anyone had a chance to take a look at this issue?
> 
> 2018-05-08 15:01 GMT+03:00 Andriy Sorokhtey :
> 
> > Hello Kafka team
> >
> > I’d like to contribute to the Kafka system tests.
> >
> > I’ve tried to execute system tests locally and I have some issues. Can
> > anyone give me a hand to figure out what’s wrong?
> >
> > So, I see that multiple system tests are failing when I try to run it with
> > the docker or with vagrant.
> > Maybe there is some way to debug it using PyCharm. For example, put some
> > breakpoint and start debugging, when the test goes to the breakpoint I’d
> > like to go to instances and check what’s going on there.
> > I’ll be thankful for any advice.
> >
> >  Here is an example of one test failure:
> >
> >> [INFO:2018-05-03 06:37:19,861]: Triggering test 1 of 37...
> >> [INFO:2018-05-03 06:37:19,870]: RunnerClient: Loading test {'directory':
> >> '/opt/kafka-dev/tests/kafkatest/tests/streams', 'file_name':
> >> 'streams_broker_compatibility_test.py', 'method_name':
> >> 'test_compatible_brokers_eos_disabled', 'cls_name':
> >> 'StreamsBrokerCompatibility', 'injected_args': {'broker_version':
> >> '0.10.1.1'}}
> >> [INFO:2018-05-03 06:37:19,874]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: Setting
> >> up...
> >> [INFO:2018-05-03 06:37:22,484]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: Running...
> >> [INFO:2018-05-03 06:38:34,129]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: FAIL:
> >> Never saw message indicating StreamsTest finished startup on 
> >> ducker@ducker05
> >> Traceback (most recent call last):
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py",
> >> line 132, in run
> >> data = self.run_test()
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py",
> >> line 185, in run_test
> >> return self.test_context.function(self.test)
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py",
> >> line 324, in wrapper
> >> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> >> File "/opt/kafka-dev/tests/kafkatest/tests/streams/
> >> streams_broker_compatibility_test.py", line 84, in
> >> test_compatible_brokers_eos_disabled
> >> processor.start()
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py",
> >> line 234, in start
> >> self.start_node(node)
> >> File "/opt/kafka-dev/tests/kafkatest/services/streams.py", line 138, in
> >> start_node
> >> monitor.wait_until('StreamsTest instance started', timeout_sec=60,
> >> err_msg="Never saw message indicating StreamsTest finished startup on " +
> >> str(node.account))
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py",
> >> line 668, in wait_until
> >> allow_fail=True) == 0, **kwargs)
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py",
> >> line 36, in wait_until
> >> raise TimeoutError(err_msg)
> >> TimeoutError: Never saw message indicating StreamsTest finished startup
> >> on ducker@ducker05
> >
> >
> > If I figure out what's wrong I can try to fix other tests.
> >
> > --
> >
> > *Sincerely*
> > *Andriy Sorokhtey*
> > +380681890146
> >
> 
> 
> 
> -- 
> 
> *Sincerely*
> *Andriy Sorokhtey*
> +380681890146

[DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-11 Thread Colin McCabe

Hi all,

Previously, we discussed some issues with alterConfigs here on the mailing 
list, and eventually came to the conclusion that the RPC as implemented can't 
be used for a shell command modifying configurations.

I wrote up a small KIP to fix the issues with the RPC.  Please take a look at 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+ModifyConfigs+API

best,
Colin

Re: [DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-13 Thread Colin McCabe

Hi Ted,

That’s a fair question.  I think the main reason I didn’t propose that
originally is that many people find null values in maps confusing.
Also, some newer java maps don’t support null values, such as
ConcuurentHashMap.  I’m curious what others think about this.
Best,
Colin

On Wed, Jul 11, 2018, at 21:28, Ted Yu wrote:
> bq. Map changes, Set
> removals,>
> Is it possible to combine the two parameters into one Map where
> null Config> value signifies removal of config ?
> This way, the following wouldn't occur (reducing un-intended config
> removal):
>
> bq. If a configuration key is specified in both *changes* and
> *removals*>
> *Cheers*
>
> On Wed, Jul 11, 2018 at 10:54 AM Colin McCabe
>  wrote:>
> > Hi all,
> >
> > Previously, we discussed some issues with alterConfigs here on the
> > mailing> > list, and eventually came to the conclusion that the RPC as
> > implemented> > can't be used for a shell command modifying configurations.
> >
> > I wrote up a small KIP to fix the issues with the RPC.  Please take
> > a look> > at
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+ModifyConfigs+API>
> >  >
> > best,
> > Colin
> >

Re: [DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-13 Thread Colin McCabe

On Fri, Jul 13, 2018, at 17:45, Ted Yu wrote:
> Looking at modifyConfigs API, it doesn't seem that ConcurrentHashMap
> should be used as the underlying parameter type.
I agree that there are other types of maps that do support null values.
However, the fact that some official map implementations from the
standard library don't support null values makes this a questionable
feature to rely on.  Imagine being a new user of this API who created a
ConcurrentHashMap, tried to insert some null keys, and pass to  the API.
It would compile, but not work.  It would certainly be confusing.
> Anyway, to signify that null value is supported, value type can be
> declared as Optional.> 
> FYI

Yeah, now that we're on Java 8, Optional could be a good choice here.

best,
Colin


> On Fri, Jul 13, 2018 at 5:35 PM Colin McCabe
>  wrote:>> __
>> Hi Ted,
>> 
>> That’s a fair question.  I think the main reason I didn’t propose
>> that originally is that many people find null values in maps
>> confusing.  Also, some newer java maps don’t support null values,
>> such as ConcuurentHashMap.  I’m curious what others think about this.>> 
>> Best,
>> Colin
>> 
>> On Wed, Jul 11, 2018, at 21:28, Ted Yu wrote:
>> > bq. Map changes, Set
>> > removals,>> >
>> > Is it possible to combine the two parameters into one Map where
>> > null Config>> > value signifies removal of config ?
>> > This way, the following wouldn't occur (reducing un-intended config>> > 
>> > removal):
>> >
>> > bq. If a configuration key is specified in both *changes* and
>> > *removals*>> >
>> > *Cheers*
>> >
>> > On Wed, Jul 11, 2018 at 10:54 AM Colin McCabe 
>> > wrote:>> >
>> > > Hi all,
>> > >
>> > > Previously, we discussed some issues with alterConfigs here on
>> > > the mailing>> > > list, and eventually came to the conclusion that the 
>> > > RPC as
>> > > implemented>> > > can't be used for a shell command modifying 
>> > > configurations.
>> > >
>> > > I wrote up a small KIP to fix the issues with the RPC.  Please
>> > > take a look>> > > at
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+ModifyConfigs+API>>
>> > >  > >
>> > > best,
>> > > Colin
>> > >
>>

Re: KIP-327: Add describe all topics API to AdminClient

2018-07-13 Thread Colin McCabe

As Jason wrote, this won't scale as the number of partitions increases.  We 
already have users who have tens of thousands of topics, or more.  If you 
multiply that by 100x over the next few years, you end up with this API 
returning full information about millions of topics, which clearly doesn't work.

We discussed this a lot in the original KIP-117 DISCUSS thread which added the 
Java AdminClient.  ListTopics and DescribeTopics were deliberately kept 
separate because we understood that eventually a single RPC would not be able 
to return information about all the topics in the cluster.  So I have to vote 
-1 for this proposal as it stands.

I do agree that adding a way to describe topics by a regular expression on the 
server side would be very useful.  This would also fix a major scalability 
problem we have now, which is that when subscribing via a regular expression, 
clients need to fetch the full list of all topics in the cluster and filter 
locally.

I think a regular expression library like re2 would be ideal for this purpose.  
re2 is standardized and language-agnostic (it's not tied only to Java).  In 
contrast, Java regular expression change with different releases of the JDK 
(there were some changes in java 8, for example).  Also, re2 regular 
expressions are linear time, never exponential time.  See 
https://github.com/google/re2j

regards,
Colin

On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> The KIP looks good to me.
> However, if there is willingness in the community to work on metadata
> request with patterns, the feature proposed here and filtering by '*' or
> '.*' would be redundant.
> 
> Andras
> 
> 
> 
> On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson  wrote:
> 
> > Hey Manikumar,
> >
> > As Kafka begins to scale to larger and larger numbers of topics/partitions,
> > I'm a little concerned about the scalability of APIs such as this. The API
> > looks benign, but imagine you have have a few million partitions. We
> > already expose similar APIs in the producer and consumer, so probably not
> > much additional harm to expose it in the AdminClient, but it would be nice
> > to put a little thought into some longer term options. We should be giving
> > users an efficient way to select a smaller set of the topics they are
> > interested in. We have always discussed adding some filtering support to
> > the Metadata API. Perhaps now is a good time to reconsider this? We now
> > have a convention for wildcard ACLs, so perhaps we can do something
> > similar. Full regex support might be ideal given the consumer's
> > subscription API, but that is more challenging. What do you think?
> >
> > Thanks,
> > Jason
> >
> > On Thu, Jul 12, 2018 at 2:35 PM, Harsha  wrote:
> >
> > > Very useful. LGTM.
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > Hi all,
> > > >
> > > > I have created a KIP to add describe all topics API to AdminClient .
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > >
> > > > Please take a look.
> > > >
> > > > Thanks,
> > >
> >

Re: KIP-327: Add describe all topics API to AdminClient

2018-07-13 Thread Colin McCabe

Good point.  We should probably have a maximum number of results like
1000 or something.  That can go in the request RPC as well...
Cheers,
Colin

On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> bq. describe topics by a regular expression on the server side
>
> Should caution be taken if the regex doesn't filter ("*") ?
>
> Cheers
>
> On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
>  wrote:>
> > As Jason wrote, this won't scale as the number of partitions
> > increases.> > We already have users who have tens of thousands of topics, or
> > more.  If> > you multiply that by 100x over the next few years, you end up 
> > with
> > this API> > returning full information about millions of topics, which 
> > clearly
> > doesn't> > work.
> >
> > We discussed this a lot in the original KIP-117 DISCUSS thread
> > which added> > the Java AdminClient.  ListTopics and DescribeTopics were
> > deliberately kept> > separate because we understood that eventually a 
> > single RPC would
> > not be> > able to return information about all the topics in the cluster.  
> > So
> > I have> > to vote -1 for this proposal as it stands.
> >
> > I do agree that adding a way to describe topics by a regular
> > expression on> > the server side would be very useful.  This would also fix 
> > a major
> > scalability problem we have now, which is that when
> > subscribing via a> > regular expression, clients need to fetch the full 
> > list of all
> > topics in> > the cluster and filter locally.
> >
> > I think a regular expression library like re2 would be ideal
> > for this> > purpose.  re2 is standardized and language-agnostic (it's not 
> > tied
> > only to> > Java).  In contrast, Java regular expression change with 
> > different
> > releases> > of the JDK (there were some changes in java 8, for example).
> > Also, re2> > regular expressions are linear time, never exponential time.  
> > See
> > https://github.com/google/re2j
> >
> > regards,
> > Colin
> >
> >
> > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > The KIP looks good to me.
> > > However, if there is willingness in the community to work on
> > > metadata> > > request with patterns, the feature proposed here and 
> > > filtering by
> > > '*' or> > > '.*' would be redundant.
> > >
> > > Andras
> > >
> > >
> > >
> > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > wrote:
> > >
> > > > Hey Manikumar,
> > > >
> > > > As Kafka begins to scale to larger and larger numbers of
> > topics/partitions,
> > > > I'm a little concerned about the scalability of APIs such as
> > > > this. The> > API
> > > > looks benign, but imagine you have have a few million
> > > > partitions. We> > > > already expose similar APIs in the producer and 
> > > > consumer, so
> > > > probably> > not
> > > > much additional harm to expose it in the AdminClient, but it
> > > > would be> > nice
> > > > to put a little thought into some longer term options. We should
> > > > be> > giving
> > > > users an efficient way to select a smaller set of the topics
> > > > they are> > > > interested in. We have always discussed adding some 
> > > > filtering
> > > > support> > to
> > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > We now> > > > have a convention for wildcard ACLs, so perhaps we can do
> > > > something> > > > similar. Full regex support might be ideal given the 
> > > > consumer's> > > > subscription API, but that is more challenging. What 
> > > > do you
> > > > think?> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha  wrote:> > > >
> > > > > Very useful. LGTM.
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I have created a KIP to add describe all topics API to
> > > > > > AdminClient> > .
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > >
> > > > > > Please take a look.
> > > > > >
> > > > > > Thanks,
> > > > >
> > > >
> >

Re: KIP-327: Add describe all topics API to AdminClient

2018-07-15 Thread Colin McCabe

Hi Stephane,

Pagniation would be useful.  But I think the more immediate need is to stop 
sending stuff over the wire that we don't even use.

For example, imagine that you have a cluster with 50,000 topics and your 
Consumer subscribes to abracadabra*. Perhaps there's actually only 3 topics  
that match that regular expression.  But with the current system, the broker 
would send all 50,000 topics over the wire to the client.  Then the client 
applies the regular expression, and throws away 49,997 of those entries, and 
just uses the remaining 3.

With pagniation, you still have a huge load on the network and the broker from 
sending all that unnecessary data.  With server-side regular expressions, you 
could only send the stuff you need.

best,
Colin

On Sat, Jul 14, 2018, at 01:06, Stephane Maarek wrote:
> Why not paginate ? Then one can retrieve as many topics as desired ?
> 
> On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe,  wrote:
> 
> > Good point.  We should probably have a maximum number of results like
> > 1000 or something.  That can go in the request RPC as well...
> > Cheers,
> > Colin
> >
> > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > bq. describe topics by a regular expression on the server side
> > >
> > > Should caution be taken if the regex doesn't filter ("*") ?
> > >
> > > Cheers
> > >
> > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > >  wrote:>
> > > > As Jason wrote, this won't scale as the number of partitions
> > > > increases.> > We already have users who have tens of thousands of
> > topics, or
> > > > more.  If> > you multiply that by 100x over the next few years, you
> > end up with
> > > > this API> > returning full information about millions of topics, which
> > clearly
> > > > doesn't> > work.
> > > >
> > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > were
> > > > deliberately kept> > separate because we understood that eventually a
> > single RPC would
> > > > not be> > able to return information about all the topics in the
> > cluster.  So
> > > > I have> > to vote -1 for this proposal as it stands.
> > > >
> > > > I do agree that adding a way to describe topics by a regular
> > > > expression on> > the server side would be very useful.  This would
> > also fix a major
> > > > scalability problem we have now, which is that when
> > > > subscribing via a> > regular expression, clients need to fetch the
> > full list of all
> > > > topics in> > the cluster and filter locally.
> > > >
> > > > I think a regular expression library like re2 would be ideal
> > > > for this> > purpose.  re2 is standardized and language-agnostic (it's
> > not tied
> > > > only to> > Java).  In contrast, Java regular expression change with
> > different
> > > > releases> > of the JDK (there were some changes in java 8, for
> > example).
> > > > Also, re2> > regular expressions are linear time, never exponential
> > time.  See
> > > > https://github.com/google/re2j
> > > >
> > > > regards,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > The KIP looks good to me.
> > > > > However, if there is willingness in the community to work on
> > > > > metadata> > > request with patterns, the feature proposed here and
> > filtering by
> > > > > '*' or> > > '.*' would be redundant.
> > > > >
> > > > > Andras
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > > wrote:
> > > > >
> > > > > > Hey Manikumar,
> > > > > >
> > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > topics/partitions,
> > > > > > I'm a little concerned about the scalability of APIs such as
> > > > > > this. The> > API
> > > > > > looks benign, but imagine you have have a few million
> > > > > > partitions. We> > > > already expose similar APIs in the produc

Re: KIP-327: Add describe all topics API to AdminClient

2018-07-15 Thread Colin McCabe

Thanks, Manikumar.  I've been meaning to bring up KIP-142 again.  It would 
definitely be a nice improvement.

best,
Colin


On Sat, Jul 14, 2018, at 08:51, Manikumar wrote:
> Hi Jason and Colin,
> 
> Thanks for the feedback. I agree that having filtering support to the
> Metadata API would be useful and solves
> the scalability issues.
> 
> But to implement specific use case of "describe all topics", regex 
> support
> won't help. In any case user needs to
> call listTopics() to get topic list, and then make describeTopics() 
> calls
> with a subset of the topics set.
> This leads to improving existing listTopics() API performance. Colin
> already raised a KIP for this: KIP-142
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster>
>  .
> May be we should consider implementing KIP-142.
> 
> Since we have support wildcard ACLs, Initially, I can explore
> prefixed/wildcards patterns support to Metadata API.
> We can later extend support for regular expressions.
> 
> Thanks
> 
> 
> 
> On Sat, Jul 14, 2018 at 2:42 PM Ted Yu  wrote:
> 
> > What if broker crashes before all the pages can be returned ?
> >
> > Cheers
> >
> > On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Why not paginate ? Then one can retrieve as many topics as desired ?
> > >
> > > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe,  wrote:
> > >
> > > > Good point.  We should probably have a maximum number of results like
> > > > 1000 or something.  That can go in the request RPC as well...
> > > > Cheers,
> > > > Colin
> > > >
> > > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > > bq. describe topics by a regular expression on the server side
> > > > >
> > > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > >  wrote:>
> > > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > > increases.> > We already have users who have tens of thousands of
> > > > topics, or
> > > > > > more.  If> > you multiply that by 100x over the next few years, you
> > > > end up with
> > > > > > this API> > returning full information about millions of topics,
> > > which
> > > > clearly
> > > > > > doesn't> > work.
> > > > > >
> > > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > > > were
> > > > > > deliberately kept> > separate because we understood that
> > eventually a
> > > > single RPC would
> > > > > > not be> > able to return information about all the topics in the
> > > > cluster.  So
> > > > > > I have> > to vote -1 for this proposal as it stands.
> > > > > >
> > > > > > I do agree that adding a way to describe topics by a regular
> > > > > > expression on> > the server side would be very useful.  This would
> > > > also fix a major
> > > > > > scalability problem we have now, which is that when
> > > > > > subscribing via a> > regular expression, clients need to fetch the
> > > > full list of all
> > > > > > topics in> > the cluster and filter locally.
> > > > > >
> > > > > > I think a regular expression library like re2 would be ideal
> > > > > > for this> > purpose.  re2 is standardized and language-agnostic
> > (it's
> > > > not tied
> > > > > > only to> > Java).  In contrast, Java regular expression change with
> > > > different
> > > > > > releases> > of the JDK (there were some changes in java 8, for
> > > > example).
> > > > > > Also, re2> > regular expressions are linear time, never exponential
> > > > time.  See
> > > > > > https://github.com/google/re2j
> > > > > >
> > > > > > regards,
> > > > > > Colin
> > > > >

Re: [DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-16 Thread Colin McCabe

On Fri, Jul 13, 2018, at 23:20, Dong Lin wrote:
> Hey Colin,
> 
> Thanks for the KIP!
> 
> It seems that the AlterConfigsResult is pretty much the same as
> ModifyConfigsResult. Instead of adding ModifyConfigs API and deprecating
> AlterConfigs API, would it be simpler to just add API alterConfigs(
> Map changes, Set removals, final
> ModifyConfigsOptions options)?
> 
> Currently we use the word "alter" in method names such as
> "alterReplicaLogDirs" and "alterCheckpointDir". So it is probably more
> preferred to keep using the word "alter" instead of "modify" if posssible.
> And if we can overload the alterConfigs(...) API to allow incremental
> change, it might make sense to keep the existing
> alterConfigs(Map Config> configs) for users who simply want to overwrite the entire configs.

If we have two functions with these type signatures:

> AlterConfigsResult alterConfigs(Map changes);
> AlterConfigsResult alterConfigs(Map changes, 
> Set removals);

It will be extremely surprising, even baffling, to users  that the second 
function overload makes incremental changes, whereas the first function 
overload clears the entire configuration before applying changes.  Just looking 
at the type signatures (which is all most developers will look at, especially 
if they're using IDE autocomplete), you would not expect such a radical 
difference between them.  You would expect the second one to work just like the 
first, except maybe it would also perform some removals.

Calling the two functions different names is good because it reflects the fact 
that they are very different.

> And those user would not have to make code change due to API deprecation.
> What do you think?

alterConfigs needs to be deprecated, though.  Code using alterConfigs is almost 
certainly buggy because of the possibility of simultaneous read-modify-write 
cycles, and the fact that some configs can't be read.

best,
Colin

> 
> Thanks,
> Dong
> 
> On Wed, Jul 11, 2018 at 10:54 AM, Colin McCabe  wrote:
> 
> > Hi all,
> >
> > Previously, we discussed some issues with alterConfigs here on the mailing
> > list, and eventually came to the conclusion that the RPC as implemented
> > can't be used for a shell command modifying configurations.
> >
> > I wrote up a small KIP to fix the issues with the RP  Please take a look
> > at https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 339%3A+Create+a+new+ModifyConfigs+API
> >
> > best,
> > Colin
> >

Re: [DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-16 Thread Colin McCabe

Hi Magnus,

Thanks for taking a look.

On Mon, Jul 16, 2018, at 11:43, Magnus Edenhill wrote:
> Thanks for driving this KIP, Colin.
> 
> I agree with Dong that a new similar modifyConfigs API (and protocol API)
> is confusing and that
> we should try to extend the current alterConfigs interface to support the
> incremental mode instead,
> deprecating the non-incremental mode in the process.

In the longer term, I think that the non-incremental mode should definitely go 
away, and not be an option at all.  That's why I don't think of this KIP as 
"adding more options  to AlterConfigs" but as getting rid of a broken API.  
I've described a lot of reasons why non-incremental mode is broken.  I've also 
described why the brokenness is subtle and an easy trap for newbies to fall 
into.  Hopefully everyone agrees that getting rid of non-incremental mode 
completely should be the eventual goal.

I do not think that having a different name for modifyConfigs is confusing.  
"Deleting all the configs and then setting some designated ones" is a very 
different operation from "modifying some configurations".  Giving the two 
operations different names expresses the fact  that they really are very 
different.  Would it be less confusing if the new function were called 
alterConfigsIncremental rather than modifyConfigs?

I think it's important to have a new name for the new function.  If the names 
are the same, how can we even explain to users which API they should or should 
not use?  "Use the three argument overload, or the two argument overload where 
the Options class is not the final parameter"  That is not user-friendly.

You could say that some of the overloads would be deprecated.  However, my 
experience as a Hadoop developer is that most users simply don't care about 
deprecation warnings.  They will use autocomplete in their IDEs and use 
whatever function seems to have the parameters they need.  Hadoop and Kafka 
themselves use plenty of deprecated APIs.  But somehow we expect that our users 
have much more respect for @deprecated than we ourselves do.

I would further argue that function overloads in Java are intended to provide 
additional parameters, not to fundamentally change the semantics of a function. 
 If you have two functions int addTwoNumbers(int a, int b) and int 
addTwoNumbers(int a, int b, boolean verbose), they should both add together two 
numbers.  And a user should be able to expect that the plain old addTwoNumbers 
is equivalent to either addTwoNumbers(verbose=true) or 
addTwoNumbers(verbose=false), not a totally different operation.

Every time programmers violate this contract, it inevitably leads to 
misunderstanding.  One example is how in HDFS there are multiple function 
overloads for renaming a file.  Depending on which one you call, you will get 
either RENAME or RENAME2, which have different semantics.  I think RENAME2 has 
a different set of return codes surrounding "destination exists" conditions, 
among other things.  Of course users have no idea of whether they're getting 
RENAME or RENAME2 unless they're developers.  It's not obvious from the 
function call, which is named "rename" in both cases, just with different 
function parameters.  So the whole thing is just a huge source of confusion and 
user pain.

Another thing to consider is that since  function overloads are also not an 
option in C or Go, we need a different solution for those languages anyway.  A 
separate function name works well for this.

> 
> Another thing that I believe is missing is a per-config-entry operation
> type, namely SET (ovewrite), APPEND or DELETE.
> The current proposal only has SET (changes) and DELETE (removals)
> semantics, but I understand there are configuration properties (such as SASL 
> auth) where
> it should be possible to APPEND to a configuration property, otherwise we'll 
> have the same
> non-atomic read-modify-write cycle problems as with the current API.
> Instead of providing two sets of config: changes and removals, I think
> it might be better to just have one set where each Config entry has
> the operation type set, this also voids any confusion on what happens
> if a property is included in both changes,removals sets.

That's a very good point.  I guess the idea is that APPEND would add a new 
entry in the comma-separated (or other delimiter-separated) list of the config 
key?  That would require per-key support, since not all configuration keys have 
the same delimiter.  That's probably not too difficult, though.

There are probably also lots of keys where APPEND makes no sense and should be 
rejected.  For example, APPENDing to a configuration controlling the number of 
active threads for a subsystem does not make sense.  Also, if we have APPEND, 
we probably also want SUBTRACT, ri

Re: KIP-327: Add describe all topics API to AdminClient

2018-07-17 Thread Colin McCabe

Thanks, Manikumar.

best,
Colin

On Tue, Jul 17, 2018, at 19:44, Manikumar wrote:
> Closing this KIP in favor of adding filtering support to the Metadata API
> and KIP-142. Will open a new KIP when ready.
> Thanks for your reviews.
> 
> On Mon, Jul 16, 2018 at 8:38 AM Colin McCabe  wrote:
> 
> > Thanks, Manikumar.  I've been meaning to bring up KIP-142 again.  It would
> > definitely be a nice improvement.
> >
> > best,
> > Colin
> >
> >
> > On Sat, Jul 14, 2018, at 08:51, Manikumar wrote:
> > > Hi Jason and Colin,
> > >
> > > Thanks for the feedback. I agree that having filtering support to the
> > > Metadata API would be useful and solves
> > > the scalability issues.
> > >
> > > But to implement specific use case of "describe all topics", regex
> > > support
> > > won't help. In any case user needs to
> > > call listTopics() to get topic list, and then make describeTopics()
> > > calls
> > > with a subset of the topics set.
> > > This leads to improving existing listTopics() API performance. Colin
> > > already raised a KIP for this: KIP-142
> > > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster
> > >
> > >  .
> > > May be we should consider implementing KIP-142.
> > >
> > > Since we have support wildcard ACLs, Initially, I can explore
> > > prefixed/wildcards patterns support to Metadata API.
> > > We can later extend support for regular expressions.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Sat, Jul 14, 2018 at 2:42 PM Ted Yu  wrote:
> > >
> > > > What if broker crashes before all the pages can be returned ?
> > > >
> > > > Cheers
> > > >
> > > > On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> > > > steph...@simplemachines.com.au> wrote:
> > > >
> > > > > Why not paginate ? Then one can retrieve as many topics as desired ?
> > > > >
> > > > > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, 
> > wrote:
> > > > >
> > > > > > Good point.  We should probably have a maximum number of results
> > like
> > > > > > 1000 or something.  That can go in the request RPC as well...
> > > > > > Cheers,
> > > > > > Colin
> > > > > >
> > > > > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > > > > bq. describe topics by a regular expression on the server side
> > > > > > >
> > > > > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > > > >  wrote:>
> > > > > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > > > > increases.> > We already have users who have tens of thousands
> > of
> > > > > > topics, or
> > > > > > > > more.  If> > you multiply that by 100x over the next few
> > years, you
> > > > > > end up with
> > > > > > > > this API> > returning full information about millions of
> > topics,
> > > > > which
> > > > > > clearly
> > > > > > > > doesn't> > work.
> > > > > > > >
> > > > > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > > > > which added> > the Java AdminClient.  ListTopics and
> > DescribeTopics
> > > > > > were
> > > > > > > > deliberately kept> > separate because we understood that
> > > > eventually a
> > > > > > single RPC would
> > > > > > > > not be> > able to return information about all the topics in
> > the
> > > > > > cluster.  So
> > > > > > > > I have> > to vote -1 for this proposal as it stands.
> > > > > > > >
> > > > > > > > I do agree that adding a way to describe topics by a regular
> > > > > > > > expression on> > the server side would be very us

Re: [VOTE] KIP-338 Support to exclude the internal topics in kafka-topics.sh command

2018-07-19 Thread Colin McCabe

+1 (non-binding)

cheers,
Colin

On Mon, Jul 16, 2018, at 01:10, Chia-Ping Tsai wrote:
> hi folks,
> 
> The discussion[1] of KIP-338[2] did not get any objection for last 6 
> days so it is time to start the voting thread.
> 
> Thanks for your time!
> 
> [1] 
> https://lists.apache.org/thread.html/9bd4e61b73c901b51132ada49743b9b703d40b85fc4eeaa5c9099900@%3Cdev.kafka.apache.org%3E
> 
> [2] 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-338+Support+to+exclude+the+internal+topics+in+kafka-topics.sh+command
> 
> Cheers,
> chia-ping

Re: [DISCUSS]: KIP-339: Create a new ModifyConfigs API

2018-07-20 Thread Colin McCabe

I updated the KIP.  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+IncrementalAlterConfigs+API

Updates:
* Use "incrementalAlterConfigs" rather than "modifyConfigs," for consistency 
with the other "alter" APIs.
* Implement Magnus' idea of supporting "append" and "subtract" on configuration 
keys that contain lists.

best,
Colin


On Mon, Jul 16, 2018, at 14:12, Colin McCabe wrote:
> Hi Magnus,
> 
> Thanks for taking a look.
> 
> On Mon, Jul 16, 2018, at 11:43, Magnus Edenhill wrote:
> > Thanks for driving this KIP, Colin.
> > 
> > I agree with Dong that a new similar modifyConfigs API (and protocol API)
> > is confusing and that
> > we should try to extend the current alterConfigs interface to support the
> > incremental mode instead,
> > deprecating the non-incremental mode in the process.
> 
> In the longer term, I think that the non-incremental mode should 
> definitely go away, and not be an option at all.  That's why I don't 
> think of this KIP as "adding more options  to AlterConfigs" but as 
> getting rid of a broken API.  I've described a lot of reasons why non-
> incremental mode is broken.  I've also described why the brokenness is 
> subtle and an easy trap for newbies to fall into.  Hopefully everyone 
> agrees that getting rid of non-incremental mode completely should be the 
> eventual goal.
> 
> I do not think that having a different name for modifyConfigs is 
> confusing.  "Deleting all the configs and then setting some designated 
> ones" is a very different operation from "modifying some 
> configurations".  Giving the two operations different names expresses 
> the fact  that they really are very different.  Would it be less 
> confusing if the new function were called alterConfigsIncremental rather 
> than modifyConfigs?
> 
> I think it's important to have a new name for the new function.  If the 
> names are the same, how can we even explain to users which API they 
> should or should not use?  "Use the three argument overload, or the two 
> argument overload where the Options class is not the final parameter"  
> That is not user-friendly.
> 
> You could say that some of the overloads would be deprecated.  However, 
> my experience as a Hadoop developer is that most users simply don't care 
> about deprecation warnings.  They will use autocomplete in their IDEs 
> and use whatever function seems to have the parameters they need.  
> Hadoop and Kafka themselves use plenty of deprecated APIs.  But somehow 
> we expect that our users have much more respect for @deprecated than we 
> ourselves do.
> 
> I would further argue that function overloads in Java are intended to 
> provide additional parameters, not to fundamentally change the semantics 
> of a function.  If you have two functions int addTwoNumbers(int a, int 
> b) and int addTwoNumbers(int a, int b, boolean verbose), they should 
> both add together two numbers.  And a user should be able to expect that 
> the plain old addTwoNumbers is equivalent to either 
> addTwoNumbers(verbose=true) or addTwoNumbers(verbose=false), not a 
> totally different operation.
> 
> Every time programmers violate this contract, it inevitably leads to 
> misunderstanding.  One example is how in HDFS there are multiple 
> function overloads for renaming a file.  Depending on which one you 
> call, you will get either RENAME or RENAME2, which have different 
> semantics.  I think RENAME2 has a different set of return codes 
> surrounding "destination exists" conditions, among other things.  Of 
> course users have no idea of whether they're getting RENAME or RENAME2 
> unless they're developers.  It's not obvious from the function call, 
> which is named "rename" in both cases, just with different function 
> parameters.  So the whole thing is just a huge source of confusion and 
> user pain.
> 
> Another thing to consider is that since  function overloads are also not 
> an option in C or Go, we need a different solution for those languages 
> anyway.  A separate function name works well for this.
> 
> > 
> > Another thing that I believe is missing is a per-config-entry operation
> > type, namely SET (ovewrite), APPEND or DELETE.
> > The current proposal only has SET (changes) and DELETE (removals)
> > semantics, but I understand there are configuration properties (such as 
> > SASL auth) where
> > it should be possible to APPEND to a configuration property, otherwise 
> > we'll have the same
> > non-atomic read-modify-write cycle problems as with the current API.
> > Instea

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-25 Thread Colin McCabe

need the uncleanable 
topic names to be accessible through a metric?  It seems like the admin should 
notice that uncleanable partitions are present, and then check the logs?

> 
> * About `max.uncleanable.partitions`, you said that this likely 
> indicates that the disk is having problems. I'm not sure that is the 
> case. For the 4 JIRAs that you mentioned about log cleaner problems, all 
> of them are partition-level scenarios that happened during normal 
> operation. None of them were indicative of disk problems.

I don't think this is a meaningful comparison.  In general, we don't accept 
JIRAs for hard disk problems that happen on a particular cluster.  If someone 
opened a JIRA that said "my hard disk is having problems" we could close that 
as "not a Kafka bug."  This doesn't prove that disk problems don't happen, but  
just that JIRA isn't the right place for them.

I do agree that the log cleaner has had a significant number of logic bugs, and 
that we need to be careful to limit their impact.  That's one reason why I 
think that a threshold of "number of uncleanable logs" is a good idea, rather 
than just failing after one IOException.  In all the cases I've seen where a 
user hit a logic bug in the log cleaner, it was just one partition that had the 
issue.  We also should increase test coverage for the log cleaner.

> * About marking disks as offline when exceeding a certain threshold, 
> that actually increases the blast radius of log compaction failures. 
> Currently, the uncleaned partitions are still readable and writable. 
> Taking the disks offline would impact availability of the uncleanable 
> partitions, as well as impact all other partitions that are on the disk.

In general, when we encounter I/O errors, we take the disk partition offline.  
This is spelled out in KIP-112 ( 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD
 ) :

> - Broker assumes a log directory to be good after it starts, and mark log 
> directory as
> bad once there is IOException when broker attempts to access (i.e. read or 
> write) the log directory.
> - Broker will be offline if all log directories are bad.
> - Broker will stop serving replicas in any bad log directory. New replicas 
> will only be created 
> on good log directory.

The behavior Stanislav is proposing for the log cleaner is actually more 
optimistic than what we do for regular broker I/O, since we will tolerate 
multiple IOExceptions, not just one.  But it's generally consistent.  Ignoring 
errors is not.  In any case, if you want to tolerate an unlimited number of I/O 
errors, you can just set the threshold to an infinite value (although I think 
that would be a bad idea).

best,
Colin

> 
> -James
> 
> 
> > On Jul 23, 2018, at 5:46 PM, Stanislav Kozlovski  
> > wrote:
> > 
> > I renamed the KIP and that changed the link. Sorry about that. Here is the
> > new link:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error
> > 
> > On Mon, Jul 23, 2018 at 5:11 PM Stanislav Kozlovski 
> > wrote:
> > 
> >> Hey group,
> >> 
> >> I created a new KIP about making log compaction more fault-tolerant.
> >> Please give it a look here and please share what you think, especially in
> >> regards to the points in the "Needs Discussion" paragraph.
> >> 
> >> KIP: KIP-346
> >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Limit+blast+radius+of+log+compaction+failure>
> >> --
> >> Best,
> >> Stanislav
> >> 
> > 
> > 
> > -- 
> > Best,
> > Stanislav
>

Re: [DISCUSS] KIP-289: Improve the default group id behavior in KafkaConsumer

2018-07-27 Thread Colin McCabe

Sorry if this is a silly question, but what's the rationale for switching to 
using null for the default group id, rather than the empty string?  Continuing 
to use the empty string seems like less churn.  And after all, we're not using 
the empty string group name for anything else.

The big incompatible change here is having brokers reject using assign(...) 
with empty / null group.id.  If I understand correctly, the KIP proposes that 
this change be made on the brokers on the next incompatible Kafka release.  But 
that has nothing to do with client versions.  Why not just have a broker config 
which controls this?  Maybe "allow.assign.empty.group.id", or something like 
that.  At first, the default will be true, and then eventually we can flip it 
over to false.

It seems like the main rationale for tying this behavior to the Kafka client 
version is to force people to stop using the empty group id so that they can 
upgrade their clients.  But it's also possible that people will stop upgrading 
their Kafka clients instead.  That would be pretty negative since  they'd miss 
out on any efficiency and feature improvements in the new clients and 
eventually have to do more protocol downgrading, etc.

best,
Colin

On Thu, Jul 26, 2018, at 11:50, Vahid S Hashemian wrote:
> Hi Jason,
> 
> That makes sense.
> I have updated the KIP based on the recent feedback.
> 
> Thanks!
> --Vahid
> 
> 
> 
> 
> From:   Jason Gustafson 
> To: dev 
> Date:   07/25/2018 02:23 PM
> Subject:Re: [DISCUSS] KIP-289: Improve the default group id 
> behavior in KafkaConsumer
> 
> 
> 
> Hi Vahid,
> 
> I was thinking we'd only use the old API version if we had to. That is,
> only if the user has explicitly configured "" as the group.id. Otherwise,
> we'd just use the new one. Another option is to just drop support in the
> client for the empty group id, but usually we allow a deprecation period
> for changes like this.
> 
> -Jason
> 
> On Wed, Jul 25, 2018 at 12:49 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
> > Hi Jason,
> >
> > Thanks for additional clarification.
> >
> > So the next version of the OffsetCommit API will return an
> > INVALID_GROUP_ID error for empty group ids; but on the client side we 
> call
> > the older version of the client until the next major release.
> > The table below should summarize this.
> >
> > +-+
> >   | Client (group.id="") |
> > +-+
> >   | pre-2.1 |   2.1  |   3.0 |
> > 
> +-+---+-++--+
> > | | V5 (cur.) | works   | works  | works |
> > + API 
> +---+-++--+
> > | | V6| N/A | N/A (calls V5/warning) | INVALID_GROUP_ID 
> |
> > 
> +-+---+-++--+
> >
> > Assumptions:
> > * 2.1: The target release version for this KIP
> > * 3.0: The next major release
> >
> > Please advise if you see an issue; otherwise, I'll update the KIP
> > accordingly.
> >
> > Thanks!
> > --Vahid
> >
> >
> >
> >
> > From:   Jason Gustafson 
> > To: dev 
> > Date:   07/25/2018 12:08 AM
> > Subject:***UNCHECKED*** Re: [DISCUSS] KIP-289: Improve the 
> default
> > group idbehavior in KafkaConsumer
> >
> >
> >
> > Hey Vahid,
> >
> > Sorry for the confusion. I think we all agree that going forward, we
> > shouldn't support the empty group id, so the question is just around
> > compatibility. I think we have to bump the OffsetCommit API version so
> > that
> > old clients which are unknowingly depending on the default empty group 
> id
> > will continue to work with new brokers. For new versions of the client, 
> we
> > can either drop support for the empty group id immediately or we can 
> give
> > users a grace period. I was thinking we would do the latter. We can 
> change
> > the default group.id, but in the case that a user has explicitly
> > configured
> > the empty group, then we can just use an old version of the OffsetCommit
> > API which still supports it. In a future release, we can drop this 
> support
> > and only use the latest OffsetCommit version. Does that make sense?
> >
> > Thanks,
> > Jason
> >
> >
> > On Tue, Jul 24, 2018 at 12:

Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-07-30 Thread Colin McCabe

Hi Manikumar,

It's great that you are taking a look at this!  Much needed.

Just one note: I assume that --authorizer-properties is no longer required if 
the --bootstrap-server option is specified.  We should probably spell this out 
somewhere in the KIP.

thanks,
Colin

On Mon, Jul 30, 2018, at 06:47, Manikumar wrote:
> Bumping this up!
> 
> On Mon, Jul 23, 2018 at 8:00 PM Manikumar  wrote:
> 
> > Hi all,
> >
> > I have created a KIP to use AdminClient API in AclCommand (kafka-acls.sh)
> >
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
> >
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API>
> >
> > Please take a look.
> >
> > Thanks,
> >

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-02 Thread Colin McCabe

On Wed, Aug 1, 2018, at 11:35, James Cheng wrote:
> I’m a little confused about something. Is this KIP focused on log 
> cleaner exceptions in general, or focused on log cleaner exceptions due 
> to disk failures?
> 
> Will max.uncleanable.partitions apply to all exceptions (including log 
> cleaner logic errors) or will it apply to only disk I/o exceptions?

There is no difference between "log cleaner exceptions in general" and "log 
cleaner exceptions due to disk failures."

For example, if the data on disk is corrupted we might read a 4-byte size as -1 
instead of 100.  Then we would get a BufferUnderFlowException later on.  This 
is a subclass of RuntimeException rather than IOException, of course, but it 
does result from a disk problem.  Or we might get exceptions while validating 
checksums, which may or may not be IOE (I haven't looked).

Of course, the log cleaner itself may have a bug, which results in it throwing 
an exception even if the disk does not have a problem.  We clearly want to fix 
these bugs.  But there's no way for the program itself to know that it has a 
bug and act differently.  If an exception occurs, we must assume there is a 
disk problem.

> 
> I can understand taking the disk offline if there have been “N” I/O 
> exceptions. Disk errors are user fixable (by replacing the affected 
> disk). It turns an invisible (soft?) failure into a visible hard 
> failure. And the I/O exceptions are possibly already causing problems, 
> so it makes sense to limit their impact.
> 
> But I’m not sure if it makes sense to take a disk offline after”N” logic 
> errors in the log cleaner. If a log cleaner logic error happens, it’s 
> rarely user fixable. And it will likely several partitions at once, so 
> you’re likely to bump up against the max.uncleanable.partitions limit 
> more quickly. If a disk was taken due to logic errors, I’m not sure what 
> the user would do.

I don't agree that log cleaner bugs "will likely [affect] several partitions at 
once."  Most of the ones I've looked at only affect one or two partitions.  In 
particular the ones that resulted from over-eagerness to use 32-bit math on 
64-bit values.

If the log cleaner is so buggy that it's useless (the scenario you're 
describing), and you want to put off an upgrade, then you can set 
max.uncleanable.partitions to the maximum value to ignore failures.

best,
Colin


> 
> -James
> 
> Sent from my iPhone
> 
> > On Aug 1, 2018, at 9:11 AM, Stanislav Kozlovski  
> > wrote:
> > 
> > Yes, good catch. Thank you, James!
> > 
> > Best,
> > Stanislav
> > 
> >> On Wed, Aug 1, 2018 at 5:05 PM James Cheng  wrote:
> >> 
> >> Can you update the KIP to say what the default is for
> >> max.uncleanable.partitions?
> >> 
> >> -James
> >> 
> >> Sent from my iPhone
> >> 
> >>> On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski 
> >> wrote:
> >>> 
> >>> Hey group,
> >>> 
> >>> I am planning on starting a voting thread tomorrow. Please do reply if
> >> you
> >>> feel there is anything left to discuss.
> >>> 
> >>> Best,
> >>> Stanislav
> >>> 
> >>> On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski <
> >> stanis...@confluent.io>
> >>> wrote:
> >>> 
> >>>> Hey, Ray
> >>>> 
> >>>> Thanks for pointing that out, it's fixed now
> >>>> 
> >>>> Best,
> >>>> Stanislav
> >>>> 
> >>>>> On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:
> >>>>> 
> >>>>> Thanks.  Can you fix the link in the "KIPs under discussion" table on
> >>>>> the main KIP landing page
> >>>>> <
> >>>>> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#
> >>> ?
> >>>>> 
> >>>>> I tried, but the Wiki won't let me.
> >>>>> 
> >>>>> -Ray
> >>>>> 
> >>>>>> On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
> >>>>>> Hey guys,
> >>>>>> 
> >>>>>> @Colin - good point. I added some sentences mentioning recent
> >>>>> improvements
> >>>>>> in the introductory section.
> >>>>>> 
> >>>>>> *Disk Failure* - I tend to agree with what Colin said - once a disk
> >>>>> fails,
> >>>>>

Re: [DISCUSS] KIP-289: Improve the default group id behavior in KafkaConsumer

2018-08-02 Thread Colin McCabe

Thanks, Jason.  I don't have a very strong opinion on this.  But like you said, 
if we skip bumping the RPC versions, this would be a smaller change, which 
might be good.

best,
Colin


On Wed, Aug 1, 2018, at 17:43, Jason Gustafson wrote:
> Hey Vahid,
> 
> I talked with Colin offline. I think specifically he felt the version bump
> on the broker was overkill since the broker still has to support the empty
> group id for older versions. I had thought that eventually we would be able
> to remove those old versions, but it's true that this may not happen until
> indefinitely far in the future. I think the main improvement here is
> changing the default group.id to null instead of "". I could go either way
> on whether bumping the protocol is useful. I do think it is helpful though
> to signal clearly that it its use is deprecated and discouraged, especially
> in light of the ACL problem. I guess we could just deprecate the use on the
> client. What do you think?
> 
> Thanks,
> Jason
> 
> On Wed, Aug 1, 2018 at 3:19 PM, Vahid S Hashemian  > wrote:
> 
> > Thanks Jason for responding to Colin's concerns.
> >
> > If there are no other comment / feedback / objection I'll start a vote
> > soon.
> >
> > Thanks.
> > --Vahid
> >
> >
> >
> > From:   Jason Gustafson 
> > To: dev 
> > Date:   07/27/2018 10:38 AM
> > Subject:Re: [DISCUSS] KIP-289: Improve the default group id
> > behavior in KafkaConsumer
> >
> >
> >
> > Hey Colin,
> >
> > The problem is both that the empty group id is the default value and that
> > it is actually accepted by the broker for offset commits. Combine that
> > with
> > the fact that auto commit is enabled by default and you users get
> > surprising behavior. If you look at a random Kafka cluster, you'll
> > probably
> > find a bunch of inadvertent offset commits for the empty group id. I was
> > hoping we could distinguish between users who are using the empty group id
> > as an accident of the default configuration and those who use it
> > intentionally. By default, there will be no group id and the consumer will
> > not commit offsets. If a user has actually intentionally used the empty
> > group id, however, it will continue to work. I actually think there are
> > probably very few people doing this (maybe even no one), but I thought we
> > might err on the side of compatibility.
> >
> > The big incompatible change here is having brokers reject using
> > assign(...)
> > > with empty / null group.id.
> >
> >
> > This is not correct. In the proposal, the broker will only reject the
> > empty
> > group id for the new version of OffsetCommit. Older clients, which cannot
> > be changed, will continue to work because the old versions of the
> > OffsetCommit API still accept the empty group id. The null group id is
> > different from the empty group id: it is not allowed in any version of the
> > API. It is basically a way to indicate that the consumer has no dependence
> > on the coordinator at all, which we actually have a surprising number of
> > use cases for. Furthermore, if a user has an actual need for the empty
> > group id, it will still be allowed. We are just deprecating it.
> >
> > -Jason
> >
> > On Fri, Jul 27, 2018 at 9:56 AM, Colin McCabe  wrote:
> >
> > > Sorry if this is a silly question, but what's the rationale for
> > switching
> > > to using null for the default group id, rather than the empty string?
> > > Continuing to use the empty string seems like less churn.  And after
> > all,
> > > we're not using the empty string group name for anything else.
> > >
> > > The big incompatible change here is having brokers reject using
> > > assign(...) with empty / null group.id.  If I understand correctly, the
> > > KIP proposes that this change be made on the brokers on the next
> > > incompatible Kafka release.  But that has nothing to do with client
> > > versions.  Why not just have a broker config which controls this?  Maybe
> > "
> > > allow.assign.empty.group.id", or something like that.  At first, the
> > > default will be true, and then eventually we can flip it over to false.
> > >
> > > It seems like the main rationale for tying this behavior to the Kafka
> > > client version is to force people to stop using the empty group id so
> > that
> > > they can upgrade their clients.  But it's also possible that people will
> > > stop upgradin

Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-08-02 Thread Colin McCabe

+1 for starting the vote

cheers,
Colin


On Wed, Aug 1, 2018, at 08:46, Manikumar wrote:
> Hi all,
> 
> If there are no concerns, I will start the voting process soon.
> 
> Thanks
> 
> On Tue, Jul 31, 2018 at 9:08 AM Manikumar  wrote:
> 
> > Hi Colin,
> >
> > Yes,  "--authorizer-properties" option is not required with
> > "--bootstrap-server" option. Updated the KIP.
> >
> >
> > Thanks,
> >
> > On Tue, Jul 31, 2018 at 1:30 AM Ted Yu  wrote:
> >
> >> Look good to me.
> >>
> >> On Mon, Jul 23, 2018 at 7:30 AM Manikumar 
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I have created a KIP to use AdminClient API in AclCommand
> >> (kafka-acls.sh)
> >> >
> >> > *
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
> >> > <
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API
> >> > >
> >> >
> >> > Please take a look.
> >> >
> >> > Thanks,
> >> >
> >>
> >

Re: [DISCUSS] KIP-348 Eliminate null from SourceTask#poll()

2018-08-03 Thread Colin McCabe

"No changes to public interface" doesn't seem accurate here.  SourceTask#poll 
is a public interface, right?  Kafka connectors that are out-of-tree would 
certainly break if we disallowed returning null from this method.

However, reading the KIP more closely, it seems like both null and the empty 
list will be supported.  Perhaps you should discuss this in the compatibility 
section?  Also, the KIP should probably be renamed "deprecate null" rather than 
"eliminate null" since it will still be possible to return null here, right?

best,
Colin

On Tue, Jul 31, 2018, at 01:10, Chia-Ping Tsai wrote:
> hi all,
> 
> Please take a look at the KIP-348[1] if you have free cycel. It bring a 
> little change to the usage of SourceTask#poll()
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89065853&moved=true#KIP-348EliminatenullfromSourceTask#poll()-
> Status
> 
> Cheers,
> Chia-Ping

Re: [DISCUSS] KIP-350: Allow kafka-topics.sh to take brokerid as parameter to show partitions associated with it

2018-08-03 Thread Colin McCabe

Thanks for the KIP, Ratish.

We should probably specify that it is an error when --broker is specified with 
operations other than --list, right?

best,
Colin


On Wed, Aug 1, 2018, at 21:28, Ratish Ravindran wrote:
> Hi,
> 
> I would like to open a discussion thread on KIP-350:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-350%3A+Allow+kafka-topics.sh+to+take+brokerid+as+parameter+to+show+partitions+associated+with+it
> 
> This is KIP is to add the optional brokerid parameter in kafka-topics.sh,
> to only get the partition associated with that specific broker. Current,ly
> we use grep to do so.
> 
> Let me know your thoughts and suggestions.
> 
> Thanks,
> Ratish

Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

2018-08-03 Thread Colin McCabe

Hi Stanislav,

Thanks for the KIP.

As as user, "FaultyRecordException" seems a little vague.  What's faulty about 
it?  Is it too big?  Does it not match the schema of the other records?  Was it 
not found?

Of course, we know that the specific problem is with [de]serializing the 
record, not with any of those those things.  So we should choose a name that 
reflects that serialization is the problem.  How about 
RecordSerializationException?

best,
Colin


On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
> Hi Jason and Ted,
> 
> @Jason
> I did not oppose the idea as I'm not too familiar with Java conventions. I
> agree it is a non-ideal way for the user to catch the exception so I
> changed it back.
> 
> I've updated the KIP and PR with the latest changes. Now, there is only one
> public exception `FaultyRecordException` which is thrown to the user in
> both scenarios (corrupt record and deserialization error).
> Please take a look again.
> 
> Best,
> Stanislav
> 
> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson  wrote:
> 
> > Hey Stanislav,
> >
> > I should have noticed it earlier from your example, but I just realized
> > that interfaces don't mix well with exceptions. There is no way to catch
> > the interface type, which means you have to depend on instanceof checks,
> > which is not very conventional. At the moment, we raise
> > SerializationException if there is a problem parsing the error, and we
> > raise KafkaException if the record fails its CRC check. Since
> > SerializationException extends KafkaExeption, it seems like we can handle
> > both cases in a compatible way by handling both cases with a single type
> > that extends SerializationException. Maybe something like
> > RecordDeserializationException?
> >
> > Thanks,
> > Jason
> >
> > On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu  wrote:
> >
> > > +1
> > >  Original message From: Stanislav Kozlovski <
> > > stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00) To:
> > > dev@kafka.apache.org Subject: [VOTE] KIP-334 Include partitions in
> > > exceptions raised during consumer record deserialization/validation
> > > Hey everybody,
> > >
> > > I'd like to start a vote thread for KIP-334 Include partitions in
> > > exceptions raised during consumer record deserialization/validation
> > > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87297793
> > > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> 
> 
> -- 
> Best,
> Stanislav

Re: [VOTE] KIP-332: Update AclCommand to use AdminClient API

2018-08-03 Thread Colin McCabe

+1 (non-binding)

regards,
Colin

On Fri, Aug 3, 2018, at 02:27, Rajini Sivaram wrote:
> Hi Manikumar,
> 
> +1 (binding)
> 
> Thanks for the KIP!
> 
> On Fri, Aug 3, 2018 at 3:46 AM, Ted Yu  wrote:
> 
> > +1
> >
> > On Thu, Aug 2, 2018 at 7:33 PM Manikumar 
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start voting on KIP-332 which allows AclCommand to use
> > > AdminClient API for acl management.
> > >
> > > KIP:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 332%3A+Update+AclCommand+to+use+AdminClient+API
> > >
> > > Discussion Thread:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg90049.html
> > >
> > > Thanks,
> > > Manikumar
> > >
> >

Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2018-01-09 Thread Colin McCabe

What if we had an internal topic which watchers could listen to for information 
about partition reassignments?  The information could be in JSON, so if we want 
to add new fields later, we always could.  

This avoids introducing a new AdminClient API.  For clients that want to be 
notified about partition reassignments in a timely fashion, this avoids the 
"polling an AdminClient API in a tight loop" antipattern.  It allows watchers 
to be notified in a simple and natural way about what is going on.  Access can 
be controlled by the existing topic ACL mechanisms.

best,
Colin


On Fri, Dec 22, 2017, at 06:48, Tom Bentley wrote:
> Hi Steven,
> 
> I must admit that I didn't really considered that option. I can see how
> attractive it is from your perspective. In practice it would come with lots
> of edge cases which would need to be thought through:
> 
> 1. What happens if the controller can't produce a record to this topic
> because the partitions leader is unavailable?
> 2. One solution to that is for the topic to be replicated on every broker,
> so that the controller could elect itself leader on controller failover.
> But that raises another problem: What if, upon controller failover, the
> controller is ineligible for leader election because it's not in the ISR?
> 3. The above questions suggest the controller might not always be able to
> produce to the topic, but the controller isn't able to control when other
> brokers catch up replicating moved partitions and has to deal with those
> events. The controller would have to record (in memory) that the
> reassignment was complete, but hadn't been published, and publish later,
> when it was able to.
> 4. Further to 3, we would need to recover the in-memory state of
> reassignments on controller failover. But now we have to consider what
> happens if the controller cannot *consume* from the topic.
> 
> This seems pretty complicated to me. I think each of the above points has
> alternatives (or compromises) which might make the problem more tractable,
> so I'd welcome hearing from anyone who has ideas on that. In particular
> there are parallels with consumer offsets which might be worth thinking
> about some more.
> 
> I would be useful it define better the use case we're trying to cater to
> here.
> 
> * Is it just a notification that a given reassignment has finished that
> you're interested in?
> * What are the consequences if such a notification is delayed, or dropped
> entirely?
> 
> Regards,
> 
> Tom
> 
> 
> 
> On 19 December 2017 at 20:34, Steven Aerts  wrote:
> 
> > Hello Tom,
> >
> >
> > when you were working out KIP-236, did you consider migrating the
> > reassignment
> > state from zookeeper to an internal kafka topic, keyed by partition
> > and log compacted?
> >
> > It would allow an admin client and controller to easily subscribe for
> > those changes,
> > without the need to extend the network protocol as discussed in KIP-240.
> >
> > This is just a theoretical idea I wanted to share, as I can't find a
> > reason why it would
> > be a stupid idea.
> > But I assume that in practice, this will imply too much change to the
> > code base to be
> > viable.
> >
> >
> > Regards,
> >
> >
> >Steven
> >
> >
> > 2017-12-18 11:49 GMT+01:00 Tom Bentley :
> > > Hi Steven,
> > >
> > > I think it would be useful to be able to subscribe yourself on updates of
> > >> reassignment changes.
> > >
> > >
> > > I agree this would be really useful, but, to the extent I understand the
> > > networking underpinnings of the admin client, it might be difficult to do
> > > well in practice. Part of the problem is that you might "set a watch" (to
> > > borrow the ZK terminology) via one broker (or the controller), only for
> > > that broker to fail (or the controller be re-elected). Obviously you can
> > > detect the loss of connection and set a new watch via a different broker
> > > (or the new controller), but that couldn't be transparent to the user,
> > > because the AdminClient doesn't know what changed while it was
> > > disconnected/not watching.
> > >
> > > Another issue is that to avoid races you really need to combine fetching
> > > the current state with setting the watch (as is done in the native
> > > ZooKeeper API). I think there are lots of subtle issues of this sort
> > which
> > > would need to be addressed to make something reliable.
> > >
> > > In the mean time, ZooKeeper already

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-09 Thread Colin McCabe

On Sat, Jan 6, 2018, at 16:00, Ewen Cheslack-Postava wrote:
> re: whitespace characters, I'm fine with the restriction since I don't see
> it becoming an issue in practice. I just don't see any reason to restrict
> it so it seems like we're going out of our way and doing extra work to be
> restrictive, but without clear motivation.

There are very good reasons not to support control characters in file names, 
topic names, log files, etc.

See http://seclists.org/fulldisclosure/2003/Feb/att-341/Termulation.txt

There are a bunch of CVEs about this, too.  Because of the (in my opinion, 
mistaken) decision to allow control characters in UNIX filenames, even echoing 
a file name to your terminal is a security vulnerability.

best,
Colin


> 
> In general my default approach (without context of a specific system) would
> be to accept anything that we can encode in UTF-8 and only apply
> restrictions where it becomes necessary (e.g. we need to define a delimiter
> for some reason). The constraints of URLs introduce some complexity (you
> need escaping), but probably generally still allow this. If I can use an
> emoji when naming things, then I'm probably happy :) Whitespace characters
> definitely have some other issues (e.g. you can have non-visible whitespace
> which obscures which connector you're actually working with), but despite
> the JIRA linked, I wasn't really convinced they need special handling. It
> seems like a really weird issue to encounter in the first place.
> 
> -Ewen
> 
> On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch  wrote:
> 
> > Sönke, I'm happy with the current proposal.
> >
> > Ewen, the proposal allows any characters in the name as long as they are
> > properly escaped/encoded. That seems to adhere to the robustness principle.
> > The only exception is that the proposal trims leading and trailing
> > whitespace characters in an attempt to reduce user errors. Can you please
> > clarify that you're okay with this behavior? I agree that technically we
> > can (and currently do) support whitespace-only names, but users have
> > reported this as problematic, and it also would be confusing for most user
> > interfaces.
> >
> > Best regards,
> >
> > Randall
> >
> > On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava 
> > wrote:
> >
> > > Very late to the game here, but a few thoughts:
> > >
> > > 1. Regarding whether KIP is necessary, I don't mind doing it for
> > > documentation sake, but I would classify any mishandling of connector
> > names
> > > here as a bug. Which doesn't require a KIP to fix.
> > >
> > > 2. For support of characters, Kafka has some history of just being
> > > restrictive (e.g., see topic name restrictions), but I personally
> > disagree
> > > with this approach. I think it is better to be liberal in what we accept
> > > and just document limitations. I think our default should be to accept
> > any
> > > user input and document why we can't handle certain inputs and how the
> > user
> > > should adapt if we can't. In general I try to work under the robustness
> > > principle: *Be conservative in what you do, be liberal in what you accept
> > > from others*
> > >
> > > 3. Related to 2, there were some cases like whitespace-only connector
> > > names. This seems extremely weird and not critical, so I'm fine not
> > > supporting it officially, but technically I don't see any reason it
> > > shouldn't be supported with any appropriate escaping (i.e. what would it
> > > break for us?).
> > >
> > > But in general, I think just being more explicit about expectations is
> > > great and it'd be great to set baseline expectations.
> > >
> > > -Ewen
> > >
> > >
> > >
> > > On Mon, Nov 20, 2017 at 12:33 AM, Sönke Liebau <
> > > soenke.lie...@opencore.com.invalid> wrote:
> > >
> > > > @Randall: are you happy with the KIP as it stands so I can call for a
> > > vote,
> > > > or are there any outstanding items still to discuss?
> > > >
> > > > Same question to anybody else who'd like to participate of course :)
> > > >
> > > > On Thu, Nov 16, 2017 at 5:35 PM, Sönke Liebau <
> > > soenke.lie...@opencore.com>
> > > > wrote:
> > > >
> > > > > Sounds good. I've added a few sentences to this effect to the KIP.
> > > > >
> > > > > On Thu, Nov 16, 2017 at 5:

Re: [DISCUSS] KIP 226 - Dynamic Broker Configuration

2018-01-09 Thread Colin McCabe

On Mon, Dec 18, 2017, at 13:40, Jason Gustafson wrote:
> Hi Rajini,
> 
> Looking good. Just a few questions.
> 
> 1. (Related to Jay's comment) Is the validate() method on Reconfigurable
> necessary? I would have thought we'd validate using the ConfigDef. Do you
> have a use case in mind in which the reconfigurable component only permits
> certain reconfigurations?

Hi,

Sorry if this is a dumb question, but when we talk about validating on the 
ConfigDef, we're talking about validating on the server side, right?  The 
software on the client side might be older or newer than the software on the 
broker side, so it seems inadvisable to do the validation there.

Also, after a software downgrade, when the broker is restarted, it might find 
that there is a configuration key that is stored in ZK that is not dynamic in 
its (older) software version.  It seems like, with the current proposal, the 
broker will use the value found in the local configuration (config file) rather 
than the new ZK version.  Should the broker print out a WARN message in that 
scenario?

best,
Colin

> 2. Should Reconfigurable extend Configurable or is the initial
> configuration also done through reconfigure()? I ask because not all
> plugins interfaces currently extend Configurable (e.g.
> KafkaPrincipalBuilder).
> 3. You mentioned a couple changes to DescribeConfigsOptions and
> DescribeConfigsResult. Perhaps we should list the changes explicitly? One
> not totally obvious case is what the synonyms() getter would return if the
> option is not specified (i.e. should it raise an exception or return an
> empty list?).
> 4. Config entries in the DescribeConfigs response have an is_default flag.
> Could that be replaced with the more general config_source?
> 5. Bit of an internal question, but how do you handle config dependencies?
> For example, suppose I want to add a listener and configure its principal
> builder at once. You'd have to validate the principal builder config in the
> context of the listener config, so I guess the order of the entries in
> AlterConfigs is significant?
> 6. KIP-48 (delegation tokens) gives us a master secret which is shared by
> all brokers. Do you think we would make this dynamically configurable?
> Alternatively, it might be possible to use it to encrypt the other
> passwords we store in zookeeper.
> 
> Thanks,
> Jason
> 
> 
> 
> On Mon, Dec 18, 2017 at 10:16 AM, Rajini Sivaram 
> wrote:
> 
> > Hi Jay,
> >
> > Thank you for reviewing the KIP.
> >
> > 1) Yes, makes sense. I will update the PR. There are some config updates
> > that may be allowed depending on the context (e.g. some security configs
> > can be updated for new listeners, but not existing listeners). Perhaps it
> > is ok to mark them dynamic in the documentation. AdminClient would give
> > appropriate error messages if the update is not allowed.
> > 2) Internally, in the implementation, a mixture of direct config updates
> > (e.g log config as you have pointed out) and reconfigure method invocations
> > (e.g. SslFactory) are used. For configurable plugins (e.g. metrics
> > reporter), we require the Reconfigurable interface to ensure that we can
> > validate any custom configs and avoid reconfiguration for plugin versions
> > that don't support it.
> >
> >
> > On Mon, Dec 18, 2017 at 5:49 PM, Jay Kreps  wrote:
> >
> > > Two thoughts on implementation (shouldn't effect the KIP):
> > >
> > >1. It might be nice to add a parameter to ConfigDef which says
> > whether a
> > >configuration is dynamically updatable or not so that we can give
> > error
> > >messages if it isn't and also have it reflected in the auto-generated
> > > docs.
> > >2. For many systems they don't really need to take action if a config
> > >changes, they just need to use the new value. Changing them all to
> > >Reconfigurable requires managing a fair amount of mutability in each
> > > class
> > >that accepts changes. Some need this since they need to take actions
> > > when a
> > >config changes, but it seems like many just need to update some value.
> > > For
> > >the later you might just be able to do something like what we do for
> > >LogConfig where there is a single CurrentConfig instance that has a
> > >reference to the current KafkaConfig and always reference your
> > > configurable
> > >parameters via this (e.g. config.current.myConfig). Dunno if that is
> > >actually better, but thought I'd throw it out there.
> > >
> > > -Jay

Re: [DISCUSS] KIP 226 - Dynamic Broker Configuration

2018-01-09 Thread Colin McCabe

Thanks, Rajini.  That makes sense.

regards,
Colin

On Tue, Jan 9, 2018, at 14:38, Rajini Sivaram wrote:
> Hi Colin,
> 
> Thank you for reviewing.
> 
> Yes, validation is done on the broker, not the client.
> 
> All configs from ZooKeeper are processed and any config that could not be
> applied are logged as warnings. This includes any configs that are not
> dynamic in the broker version or any configs that are not supported in the
> broker version. If you downgrade to a version that is older than this KIP
> (1.0 for example), then you don't get any warnings however.
> 
> 
> On Tue, Jan 9, 2018 at 9:38 PM, Colin McCabe  wrote:
> 
> > On Mon, Dec 18, 2017, at 13:40, Jason Gustafson wrote:
> > > Hi Rajini,
> > >
> > > Looking good. Just a few questions.
> > >
> > > 1. (Related to Jay's comment) Is the validate() method on Reconfigurable
> > > necessary? I would have thought we'd validate using the ConfigDef. Do you
> > > have a use case in mind in which the reconfigurable component only
> > permits
> > > certain reconfigurations?
> >
> > Hi,
> >
> > Sorry if this is a dumb question, but when we talk about validating on the
> > ConfigDef, we're talking about validating on the server side, right?  The
> > software on the client side might be older or newer than the software on
> > the broker side, so it seems inadvisable to do the validation there.
> >
> > Also, after a software downgrade, when the broker is restarted, it might
> > find that there is a configuration key that is stored in ZK that is not
> > dynamic in its (older) software version.  It seems like, with the current
> > proposal, the broker will use the value found in the local configuration
> > (config file) rather than the new ZK version.  Should the broker print out
> > a WARN message in that scenario?
> >
> > best,
> > Colin
> >
> > > 2. Should Reconfigurable extend Configurable or is the initial
> > > configuration also done through reconfigure()? I ask because not all
> > > plugins interfaces currently extend Configurable (e.g.
> > > KafkaPrincipalBuilder).
> > > 3. You mentioned a couple changes to DescribeConfigsOptions and
> > > DescribeConfigsResult. Perhaps we should list the changes explicitly? One
> > > not totally obvious case is what the synonyms() getter would return if
> > the
> > > option is not specified (i.e. should it raise an exception or return an
> > > empty list?).
> > > 4. Config entries in the DescribeConfigs response have an is_default
> > flag.
> > > Could that be replaced with the more general config_source?
> > > 5. Bit of an internal question, but how do you handle config
> > dependencies?
> > > For example, suppose I want to add a listener and configure its principal
> > > builder at once. You'd have to validate the principal builder config in
> > the
> > > context of the listener config, so I guess the order of the entries in
> > > AlterConfigs is significant?
> > > 6. KIP-48 (delegation tokens) gives us a master secret which is shared by
> > > all brokers. Do you think we would make this dynamically configurable?
> > > Alternatively, it might be possible to use it to encrypt the other
> > > passwords we store in zookeeper.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > > On Mon, Dec 18, 2017 at 10:16 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jay,
> > > >
> > > > Thank you for reviewing the KIP.
> > > >
> > > > 1) Yes, makes sense. I will update the PR. There are some config
> > updates
> > > > that may be allowed depending on the context (e.g. some security
> > configs
> > > > can be updated for new listeners, but not existing listeners). Perhaps
> > it
> > > > is ok to mark them dynamic in the documentation. AdminClient would give
> > > > appropriate error messages if the update is not allowed.
> > > > 2) Internally, in the implementation, a mixture of direct config
> > updates
> > > > (e.g log config as you have pointed out) and reconfigure method
> > invocations
> > > > (e.g. SslFactory) are used. For configurable plugins (e.g. metrics
> > > > reporter), we require the Reconfigurable interface to ensure that we
> > can
> > > > validate any custom configs and avoid reconfiguration for plugin
> > versions

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-12 Thread Colin McCabe

On Fri, Jan 12, 2018, at 08:03, Sönke Liebau wrote:
> Hi everybody,
> 
> from reading the discussion I understand that we have two things still
> open for discussen.
> 
>  Ewen is still a bit on the fence about whether or not we trim
> whitespace characters but seems to favor not doing it due to there not
> being a real issue with them. I think it mostly boils down to a
> question of general preference. I am happy to change the code to allow
> leading and trailing whitespace characters again. If someone has a use
> case for these, so be it - I don't see a technical reason against
> them. Personally I think it is more likely that someone accidentally
> gets a whitespace character in his connector name somehow and
> subsequently has a confusing time figuring it out, but it shouldn't be
> that tough to spot and is correct behavior, so no issue with it.
> TL/DR: I'm happy either way :)
> 
> Colin brought up control characters and that we should disallow these
> in connector names. The specific one that is mentioned in his link is
> Ascii 27 - ESC - \e so one possibility would be to explicitly
> blacklist this. The rest of the control characters (Ascii 0 through 31
> and 127) should be less critical I think, but since there is no way of
> knowing all software that might look at these strings and interpret
> them there is no real way of being certain. I think there is a case to
> be made for disallowing all control characters (and their respective
> escape sequences where applicable) in connector names - perhaps with
> the possible exclusion of /n /r /t .

+1

Colin

> 
> Thoughts?
> 
> Kind regards,
> Sönke
> 
> 
> 
> On Wed, Jan 10, 2018 at 7:23 AM, Ewen Cheslack-Postava
>  wrote:
> > great point, I'm always for exclusions where they make sense. i just prefer
> > to include by default w/ exclusions when necessary to listing explicit
> > inclusions and being restrictive. (and security updates immediately as
> > needed).
> >
> > If you have a set of characters you think we should exclude, I think it
> > would be good to add them here or in a subsequent KIP!
> >
> > -Ewen
> >
> > On Tue, Jan 9, 2018 at 1:30 PM, Colin McCabe  wrote:
> >
> >> On Sat, Jan 6, 2018, at 16:00, Ewen Cheslack-Postava wrote:
> >> > re: whitespace characters, I'm fine with the restriction since I don't
> >> see
> >> > it becoming an issue in practice. I just don't see any reason to restrict
> >> > it so it seems like we're going out of our way and doing extra work to be
> >> > restrictive, but without clear motivation.
> >>
> >> There are very good reasons not to support control characters in file
> >> names, topic names, log files, etc.
> >>
> >> See http://seclists.org/fulldisclosure/2003/Feb/att-341/Termulation.txt
> >>
> >> There are a bunch of CVEs about this, too.  Because of the (in my opinion,
> >> mistaken) decision to allow control characters in UNIX filenames, even
> >> echoing a file name to your terminal is a security vulnerability.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> >
> >> > In general my default approach (without context of a specific system)
> >> would
> >> > be to accept anything that we can encode in UTF-8 and only apply
> >> > restrictions where it becomes necessary (e.g. we need to define a
> >> delimiter
> >> > for some reason). The constraints of URLs introduce some complexity (you
> >> > need escaping), but probably generally still allow this. If I can use an
> >> > emoji when naming things, then I'm probably happy :) Whitespace
> >> characters
> >> > definitely have some other issues (e.g. you can have non-visible
> >> whitespace
> >> > which obscures which connector you're actually working with), but despite
> >> > the JIRA linked, I wasn't really convinced they need special handling. It
> >> > seems like a really weird issue to encounter in the first place.
> >> >
> >> > -Ewen
> >> >
> >> > On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch  wrote:
> >> >
> >> > > Sönke, I'm happy with the current proposal.
> >> > >
> >> > > Ewen, the proposal allows any characters in the name as long as they
> >> are
> >> > > properly escaped/encoded. That seems to adhere to the robustness
> >> principle.
> >> > > The only exception is that the proposal trims leading and trailing
>

Re: Vote for KIP-245: Use Properties instead of StreamsConfig in KafkaStreams constructor

2018-01-16 Thread Colin McCabe

Why not just have a StreamsConfig constructor that takes a Properties object?  
This has a few advantages.  Firstly, because it's purely additive, it doesn't 
create any deprecated functions  or compatibility issues that we have to clean 
up later.  Secondly, if we decide to do something interesting with 
StreamsConfig later, we still have it.  For example, we could have a builder, 
or some interesting ways to serialize it or send it to a string.

best,
Colin


On Tue, Jan 16, 2018, at 09:54, Matthias J. Sax wrote:
> Thanks for updating the KIP.
> 
> I am recasting my vote +1 (binding).
> 
> 
> -Matthias
> 
> On 1/13/18 4:30 AM, Boyang Chen wrote:
> > Hey Matt and Guozhang,
> > 
> > 
> > I have already updated the pull
> > request: https://github.com/apache/kafka/pull/4354
> > 
> > and the
> > KIP: 
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-245%3A+Use+Properties+instead+of+StreamsConfig+in+KafkaStreams+constructor
> > 
> > 
> > to reflect the change proposed by Guozhang(adding a 4th constructor)
> > 
> > 
> > Let me know your thoughts!
> > 
> > 
> > Best,
> > 
> > Boyang
> > 
> > 
> > 
> > 
> > *From:* Boyang Chen 
> > *Sent:* Saturday, January 13, 2018 9:37 AM
> > *To:* Matthias J. Sax
> > *Subject:* Re: Vote for KIP-245: Use Properties instead of StreamsConfig
> > in KafkaStreams constructor
> >  
> > 
> > Sounds good, will do. However I don't receive the +1 emails, interesting...
> > 
> > 
> > 
> > 
> > *From:* Matthias J. Sax 
> > *Sent:* Saturday, January 13, 2018 9:32 AM
> > *To:* Boyang Chen
> > *Subject:* Re: Vote for KIP-245: Use Properties instead of StreamsConfig
> > in KafkaStreams constructor
> >  
> > Guozhang left a comment about having a 4th overload. I agree that we
> > should add this 4th overload.
> > 
> > Please update the KIP accordingly and follow up on the mailing list
> > thread. Than we can vote it through.
> > 
> > Thx.
> > 
> > -Matthias
> > 
> > On 1/12/18 4:48 PM, Boyang Chen wrote:
> >> Hey Matt,
> >> 
> >> 
> >> I haven't received any approval/veto on this KIP. Everything is ready
> >> but only needs one approval. Any step I should take?
> >> 
> >> Thanks for the help!
> >> 
> >> Boyang
> >> 
> >> 
> >> 
> >> 
> >> *From:* Matthias J. Sax 
> >> *Sent:* Saturday, January 13, 2018 3:47 AM
> >> *To:* dev@kafka.apache.org
> >> *Subject:* Re: Vote for KIP-245: Use Properties instead of StreamsConfig
> >> in KafkaStreams constructor
> >>  
> >> Boyang,
> >> 
> >> what is the status of this KIP? The release plan for 1.1 was just
> >> announced and we like to get this KIP into the release.
> >> 
> >> Thx.
> >> 
> >> 
> >> -Matthias
> >> 
> >> On 1/2/18 11:18 AM, Guozhang Wang wrote:
> >>> Boyang,
> >>> 
> >>> Thanks for the proposed change, the wiki page lgtm. One minor comment
> >>> otherwise I'm +1:
> >>> 
> >>> For the new API, we now also have a constructor that accepts both a
> >>> clientSupplier and a Time, so we should consider having four overloads in
> >>> total:
> >>> 
> >>> 
> >>> // New API (using Properties)
> >>> public KafkaStreams(final Topology, final Properties props)
> >>> public KafkaStreams(final Topology, final Properties props, final Time 
> >>> time)
> >>> public KafkaStreams(final Topology, final Properties props, final
> >>> KafkaClientSupplier
> >>> clientSupplier)
> >>> public KafkaStreams(final Topology, final Properties props, final
> >>> KafkaClientSupplier
> >>> clientSupplier, final Time time)
> >>> 
> >>> Guozhang
> >>> 
> >>> On Tue, Dec 26, 2017 at 7:26 PM, Satish Duggana 
> >>> wrote:
> >>> 
> >>>> Thanks for the KIP, +1 from me.
> >>>>
> >>>> On Wed, Dec 27, 2017 at 7:42 AM, Bill Bejeck  wrote:
> >>>>
> >>>>> Thanks for the KIP.  +1 for me.
> >>>>>
> >>>&g

Re: [VOTE] KIP-229: DeleteGroups API

2018-01-16 Thread Colin McCabe

Thanks for this KIP, Vahid.  +1 (non-binding).

How about creating a GROUP_ID_NOT_FOUND error, rather than re-using  
INVALID_GROUP_ID here?  INVALID_GROUP_ID is used to indicate that the group id 
itself is bad (contains invalid characters, is an empty string, etc.).  A group 
ID not being found on the server is different than the ID itself not being 
valid.  We should probably not combine these two error cases.

On Tue, Jan 16, 2018, at 09:31, Guozhang Wang wrote:
> Thanks Vahid, +1 (binding) from me.
> 
> A minor question as for tooling improvements you mentioned in the wiki, are
> we going to add a `--delete` option on the `kafka-consumer-groups.sh`
> script? I thought it is "yes" but the Proposed Changes section does not
> explicitly mention it.

+1 for a --delete option on kafka-consumer-groups.sh

regards,
Colin

> 
> 
> Guozhang
> 
> 
> On Tue, Jan 16, 2018 at 5:55 AM, Mickael Maison 
> wrote:
> 
> > +1 (non binding)
> > Thanks for the KIP
> >
> > On Tue, Jan 16, 2018 at 11:54 AM, Rajini Sivaram
> >  wrote:
> > > Hi Vahid,
> > >
> > > +1 (binding)
> > >
> > > Thanks for the KIP.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Tue, Jan 16, 2018 at 10:24 AM, Edoardo Comar 
> > wrote:
> > >
> > >> +1 (non binding) - thanks Vahid
> > >>
> > >> --
> > >>
> > >> Edoardo Comar
> > >>
> > >> IBM Message Hub
> > >>
> > >> IBM UK Ltd, Hursley Park, SO21 2JN
> > >>
> > >>
> > >>
> > >> From:   Ted Yu 
> > >> To: dev@kafka.apache.org
> > >> Date:   15/01/2018 20:33
> > >> Subject:Re: [VOTE] KIP-229: DeleteGroups API
> > >>
> > >>
> > >>
> > >> +1
> > >>
> > >> On Mon, Jan 15, 2018 at 12:22 PM, Jeff Widman 
> > wrote:
> > >>
> > >> > +1 (non-binding)
> > >> >
> > >> > On Jan 15, 2018 10:23 AM, "Vahid S Hashemian"
> > >> 
> > >> > wrote:
> > >> >
> > >> > > Happy Monday,
> > >> > >
> > >> > > I believe the concerns on this KIP have been addressed in the
> > current
> > >> > > version of the KIP:
> > >> > >
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > >> apache.org_confluence_display_KAFKA_KIP-2D&d=DwIBaQ&c=jf_
> > >> iaSHvJObTbx-siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=
> > >> gy9vDnDO2DIaWpZBxdO89w7Pxx5mZNKksKLbjB4_Yp4&s=XYsVEz3hq1Qyb8Oho_
> > >> z7iSlv88IVioUD4iIQ9gp_Trc&e=
> > >>
> > >> > > 229%3A+DeleteGroups+API
> > >> > > So I'd like to start a vote.
> > >> > >
> > >> > > Thanks.
> > >> > > --Vahid
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> Unless stated otherwise above:
> > >> IBM United Kingdom Limited - Registered in England and Wales with number
> > >> 741598.
> > >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> > >>
> >
> 
> 
> 
> -- 
> -- Guozhang

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2166 matches

Mail list logo