[jira] [Created] (KAFKA-7981) Add Replica Fetcher and Log Cleaner Count Metrics

2019-02-22 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-7981:
--

 Summary: Add Replica Fetcher and Log Cleaner Count Metrics
 Key: KAFKA-7981
 URL: https://issues.apache.org/jira/browse/KAFKA-7981
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] KIP-434: Add Replica Fetcher and Log Cleaner Count Metrics

2019-02-22 Thread Viktor Somogyi-Vass
Hi All,

I'd like to start a discussion about exposing count gauge metrics for the
replica fetcher and log cleaner thread counts. It isn't a long KIP and the
motivation is very simple: monitoring the thread counts in these cases
would help with the investigation of various issues and might help in
preventing more serious issues when a broker is in a bad state. Such a
scenario that we seen with users is that their disk fills up as the log
cleaner died for some reason and couldn't recover (like log corruption). In
this case an early warning would help in the root cause analysis process as
well as enable detecting and resolving the problem early on.

The KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics

I'd be happy to receive any feedback on this.

Regards,
Viktor


Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
Hi Folks,

I also have a pending active work on the incremental partition reassignment
stuff here: https://issues.apache.org/jira/browse/KAFKA-6794
I think it would be good to cooperate on this to make both work compatible
with each other.

I'll write up a KIP about this today so it'll be easier to see how to fit
the two together. Basically in my work I operate on the
/admin/reassign_partitions node on a fully compatible way, meaning I won't
change it just calculate each increment based on that and the current state
of the ISR set for the partition in reassignment.
I hope we could collaborate on this.

Viktor

On Thu, Feb 21, 2019 at 9:04 PM Harsha  wrote:

> Thanks George. LGTM.
> Jun & Tom, Can you please take a look at the updated KIP.
> Thanks,
> Harsha
>
> On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote:
> > Hi,
> >
> > After discussing with Tom, Harsha and I are picking up KIP-236 <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment>.
> The work focused on safely/cleanly cancel / rollback pending reassignments
> in a timely fashion. Pull Request #6296 <
> https://github.com/apache/kafka/pull/6296> Still working on more
> integration/system tests.
> >
> > Please review and provide feedbacks/suggestions.
> >
> > Thanks,
> > George
> >
> >
> > On Saturday, December 23, 2017, 0:51:13 GMT, Jun Rao 
> wrote:
> >
> > Hi, Tom,
>
> Thanks for the reply.
>
> 10. That's a good thought. Perhaps it's better to get rid of
> /admin/reassignment_requests
> too. The window when a controller is not available is small. So, we can
> just failed the admin client if the controller is not reachable after the
> timeout.
>
> 13. With the changes in 10, the old approach is handled through ZK callback
> and the new approach is through Kafka RPC. The ordering between the two is
> kind of arbitrary. Perhaps the ordering can just be based on the order that
> the reassignment is added to the controller request queue. From there, we
> can either do the overriding or the prevention.
>
> Jun
>
>
> On Fri, Dec 22, 2017 at 7:31 AM, Tom Bentley 
> wrote:
>
> > Hi Jun,
> >
> > Thanks for responding, my replies are inline:
> >
> > 10. You explanation makes sense. My remaining concern is the additional
> ZK
> > > writes in the proposal. With the proposal, we will need to do following
> > > writes in ZK.
> > >
> > > a. write new assignment in /admin/reassignment_requests
> > >
> > > b. write new assignment and additional metadata in
> > > /admin/reassignments/$topic/$partition
> > >
> > > c. write old + new assignment  in /brokers/topics/[topic]
> > >
> > > d. write new assignment in /brokers/topics/[topic]
> > >
> > > e. delete /admin/reassignments/$topic/$partition
> > >
> > > So, there are quite a few ZK writes. I am wondering if it's better to
> > > consolidate the info in /admin/reassignments/$topic/$partition into
> > > /brokers/topics/[topic].
> > > For example, we can just add some new JSON fields in
> > > /brokers/topics/[topic]
> > > to remember the new assignment and potentially the original replica
> count
> > > when doing step c. Those fields with then be removed in step d. That
> way,
> > > we can get rid of step b and e, saving 2 ZK writes per partition.
> > >
> >
> > This seems like a great idea to me.
> >
> > It might also be possible to get rid of the /admin/reassignment_requests
> > subtree too. I've not yet published the ideas I have for the AdminClient
> > API for reassigning partitions, but given the existence of such an API,
> the
> > route to starting a reassignment would be the AdminClient, and not
> > zookeeper. In that case there is no need for /admin/reassignment_requests
> > at all. The only drawback that I can see is that while it's currently
> > possible to trigger a reassignment even during a controller
> > election/failover that would no longer be the case if all requests had to
> > go via the controller.
> >
> >
> > > 11. What you described sounds good. We could potentially optimize the
> > > dropped replicas a bit more. Suppose that assignment [0,1,2] is first
> > > changed to [1,2,3] and then to [2,3,4]. When initiating the second
> > > assignment, we may end up dropping replica 3 and only to restart it
> > again.
> > > In this case, we could only drop a replica if it's not going to be
> added
> > > back again.
> > >
> >
> > I had missed that, thank you! I will update the proposed algorithm to
> > prevent this.
> >
> >
> > > 13. Since this is a corner case, we can either prevent or allow
> > overriding
> > > with old/new mechanisms. To me, it seems that allowing is simpler to
> > > implement, the order in /admin/reassignment_requests determines the
> > > ordering the of override, whether that's initiated by the new way or
> the
> > > old way.
> > >
> >
> > That makes sense except for the corner case where:
> >
> > * There is no current controller and
> > * Writes to both the new and old znodes happen
> >
> > On election of the new controller, for those p

[jira] [Created] (KAFKA-7982) ConcurrentModificationException and Continuous warnings "Attempting to send response via channel for which there is no open connection"

2019-02-22 Thread Abhi (JIRA)
Abhi created KAFKA-7982:
---

 Summary: ConcurrentModificationException and Continuous warnings 
"Attempting to send response via channel for which there is no open connection"
 Key: KAFKA-7982
 URL: https://issues.apache.org/jira/browse/KAFKA-7982
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: Abhi


Hi,

I am getting follow warnings in server.log continuosly and due to this client 
consumer is not able to consumer messages.

[2019-02-20 10:26:30,312] WARN Attempting to send response via channel for 
which there is no open connection, connection id 
10.218.27.45:9092-10.219.25.239:35248-6259 (kafka.network.Processor)
[2019-02-20 10:26:56,760] WARN Attempting to send response via channel for 
which there is no open connection, connection id 
10.218.27.45:9092-10.219.25.239:35604-6261 (kafka.network.Processor)

I also noticed that before these warnings started to appear, following 
concurrent modification exception for the same IP address:

[2019-02-20 09:01:11,175] INFO Initiating logout for 
kafka/u-kafkatst-kafkadev-1.sd@unix.com 
(org.apache.kafka.common.security.kerberos.KerberosLogin)
[2019-02-20 09:01:11,176] WARN [SocketServer brokerId=1] Unexpected error from 
/10.219.25.239; closing connection (org.apache.kafka.common.network.Selector)
java.util.ConcurrentModificationException
at 
java.base/java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:970)
at java.base/java.util.LinkedList$ListItr.next(LinkedList.java:892)
at 
java.base/javax.security.auth.Subject$SecureSet$1.next(Subject.java:1096)
at 
java.base/javax.security.auth.Subject$ClassSet$1.run(Subject.java:1501)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at 
java.base/javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1499)
at 
java.base/javax.security.auth.Subject$ClassSet.(Subject.java:1472)
at 
java.base/javax.security.auth.Subject.getPrivateCredentials(Subject.java:764)
at java.security.jgss/sun.security.jgss.GSSUtil$1.run(GSSUtil.java:336)
at java.security.jgss/sun.security.jgss.GSSUtil$1.run(GSSUtil.java:328)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at 
java.security.jgss/sun.security.jgss.GSSUtil.searchSubject(GSSUtil.java:328)
at 
java.security.jgss/sun.security.jgss.wrapper.NativeGSSFactory.getCredFromSubject(NativeGSSFactory.java:53)
at 
java.security.jgss/sun.security.jgss.wrapper.NativeGSSFactory.getCredentialElement(NativeGSSFactory.java:116)
at 
java.security.jgss/sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:187)
at 
java.security.jgss/sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:439)
at 
java.security.jgss/sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:74)
at 
java.security.jgss/sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:148)
at 
jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
at 
jdk.security.jgss/com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
at 
java.security.sasl/javax.security.sasl.Sasl.createSaslServer(Sasl.java:537)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.lambda$createSaslKerberosServer$12(SaslServerAuthenticator.java:212)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.createSaslKerberosServer(SaslServerAuthenticator.java:211)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.createSaslServer(SaslServerAuthenticator.java:164)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleKafkaRequest(SaslServerAuthenticator.java:450)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:248)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:132)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:532)
at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
at kafka.network.Processor.poll(SocketServer.scala:689)
at kafka.network.Processor.run(SocketServer.scala:594)
at java.base/java.lang.Thread.run(Thread.java:834)
[2019-02-22 00:18:29,439] INFO Initiating re-login for 
kafka/u-kafkatst-kafkadev-1.sd.deshaw@unix.deshaw.com 
(org.apache.kafka.common.security.kerberos.KerberosLogin)
[2019-02-22 00:18:29,440] WARN [SocketServer brokerId=1] Unexpected error from 
/10.219.25.239; closing connection (org.

Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
I've published the above mentioned KIP here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment
Will start a discussion about it soon.

On Fri, Feb 22, 2019 at 12:45 PM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi Folks,
>
> I also have a pending active work on the incremental partition
> reassignment stuff here: https://issues.apache.org/jira/browse/KAFKA-6794
> I think it would be good to cooperate on this to make both work compatible
> with each other.
>
> I'll write up a KIP about this today so it'll be easier to see how to fit
> the two together. Basically in my work I operate on the
> /admin/reassign_partitions node on a fully compatible way, meaning I won't
> change it just calculate each increment based on that and the current state
> of the ISR set for the partition in reassignment.
> I hope we could collaborate on this.
>
> Viktor
>
> On Thu, Feb 21, 2019 at 9:04 PM Harsha  wrote:
>
>> Thanks George. LGTM.
>> Jun & Tom, Can you please take a look at the updated KIP.
>> Thanks,
>> Harsha
>>
>> On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote:
>> > Hi,
>> >
>> > After discussing with Tom, Harsha and I are picking up KIP-236 <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment>.
>> The work focused on safely/cleanly cancel / rollback pending reassignments
>> in a timely fashion. Pull Request #6296 <
>> https://github.com/apache/kafka/pull/6296> Still working on more
>> integration/system tests.
>> >
>> > Please review and provide feedbacks/suggestions.
>> >
>> > Thanks,
>> > George
>> >
>> >
>> > On Saturday, December 23, 2017, 0:51:13 GMT, Jun Rao 
>> wrote:
>> >
>> > Hi, Tom,
>>
>> Thanks for the reply.
>>
>> 10. That's a good thought. Perhaps it's better to get rid of
>> /admin/reassignment_requests
>> too. The window when a controller is not available is small. So, we can
>> just failed the admin client if the controller is not reachable after the
>> timeout.
>>
>> 13. With the changes in 10, the old approach is handled through ZK
>> callback
>> and the new approach is through Kafka RPC. The ordering between the two is
>> kind of arbitrary. Perhaps the ordering can just be based on the order
>> that
>> the reassignment is added to the controller request queue. From there, we
>> can either do the overriding or the prevention.
>>
>> Jun
>>
>>
>> On Fri, Dec 22, 2017 at 7:31 AM, Tom Bentley 
>> wrote:
>>
>> > Hi Jun,
>> >
>> > Thanks for responding, my replies are inline:
>> >
>> > 10. You explanation makes sense. My remaining concern is the additional
>> ZK
>> > > writes in the proposal. With the proposal, we will need to do
>> following
>> > > writes in ZK.
>> > >
>> > > a. write new assignment in /admin/reassignment_requests
>> > >
>> > > b. write new assignment and additional metadata in
>> > > /admin/reassignments/$topic/$partition
>> > >
>> > > c. write old + new assignment  in /brokers/topics/[topic]
>> > >
>> > > d. write new assignment in /brokers/topics/[topic]
>> > >
>> > > e. delete /admin/reassignments/$topic/$partition
>> > >
>> > > So, there are quite a few ZK writes. I am wondering if it's better to
>> > > consolidate the info in /admin/reassignments/$topic/$partition into
>> > > /brokers/topics/[topic].
>> > > For example, we can just add some new JSON fields in
>> > > /brokers/topics/[topic]
>> > > to remember the new assignment and potentially the original replica
>> count
>> > > when doing step c. Those fields with then be removed in step d. That
>> way,
>> > > we can get rid of step b and e, saving 2 ZK writes per partition.
>> > >
>> >
>> > This seems like a great idea to me.
>> >
>> > It might also be possible to get rid of the /admin/reassignment_requests
>> > subtree too. I've not yet published the ideas I have for the AdminClient
>> > API for reassigning partitions, but given the existence of such an API,
>> the
>> > route to starting a reassignment would be the AdminClient, and not
>> > zookeeper. In that case there is no need for
>> /admin/reassignment_requests
>> > at all. The only drawback that I can see is that while it's currently
>> > possible to trigger a reassignment even during a controller
>> > election/failover that would no longer be the case if all requests had
>> to
>> > go via the controller.
>> >
>> >
>> > > 11. What you described sounds good. We could potentially optimize the
>> > > dropped replicas a bit more. Suppose that assignment [0,1,2] is first
>> > > changed to [1,2,3] and then to [2,3,4]. When initiating the second
>> > > assignment, we may end up dropping replica 3 and only to restart it
>> > again.
>> > > In this case, we could only drop a replica if it's not going to be
>> added
>> > > back again.
>> > >
>> >
>> > I had missed that, thank you! I will update the proposed algorithm to
>> > prevent this.
>> >
>> >
>> > > 13. Since this is a corner case, we can either prevent or allow
>> > overriding
>> > > with old/new mechanisms. To me

[DISCUSS] KIP-435: Incremental Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
Hi Folks,

I've created a KIP about an improvement of the reassignment algorithm we
have. It aims to enable partition-wise incremental reassignment. The
motivation for this is to avoid excess load that the current replication
algorithm implicitly carries as in that case there are points in the
algorithm where both the new and old replica set could be online and
replicating which puts double (or almost double) pressure on the brokers
which could cause problems.
Instead my proposal would slice this up into several steps where each step
is calculated based on the final target replicas and the current replica
assignment taking into account scenarios where brokers could be offline and
when there are not enough replicas to fulfil the min.insync.replica
requirement.

The link to the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment

I'd be happy to receive any feedback.

An important note is that this KIP and another one, KIP-236 that is about
interruptible reassignment (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment)
should be compatible.

Thanks,
Viktor


Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread Viktor Somogyi-Vass
Read through the KIP and I have one comment:

It seems like it is not looking strictly for cancellation but also
implements rolling back to the original. I think it'd be much simpler to
generate a reassignment json on cancellation that contains the original
assignment and start a new partition reassignment completely. This way the
reassignment algorithm (whatever it is) could be reused as a whole. Did you
consider this or are there any obstacles that prevents doing this?

Regards,
Viktor

On Fri, Feb 22, 2019 at 2:24 PM Viktor Somogyi-Vass 
wrote:

> I've published the above mentioned KIP here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment
> Will start a discussion about it soon.
>
> On Fri, Feb 22, 2019 at 12:45 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I also have a pending active work on the incremental partition
>> reassignment stuff here: https://issues.apache.org/jira/browse/KAFKA-6794
>> I think it would be good to cooperate on this to make both work
>> compatible with each other.
>>
>> I'll write up a KIP about this today so it'll be easier to see how to fit
>> the two together. Basically in my work I operate on the
>> /admin/reassign_partitions node on a fully compatible way, meaning I won't
>> change it just calculate each increment based on that and the current state
>> of the ISR set for the partition in reassignment.
>> I hope we could collaborate on this.
>>
>> Viktor
>>
>> On Thu, Feb 21, 2019 at 9:04 PM Harsha  wrote:
>>
>>> Thanks George. LGTM.
>>> Jun & Tom, Can you please take a look at the updated KIP.
>>> Thanks,
>>> Harsha
>>>
>>> On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote:
>>> > Hi,
>>> >
>>> > After discussing with Tom, Harsha and I are picking up KIP-236 <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment>.
>>> The work focused on safely/cleanly cancel / rollback pending reassignments
>>> in a timely fashion. Pull Request #6296 <
>>> https://github.com/apache/kafka/pull/6296> Still working on more
>>> integration/system tests.
>>> >
>>> > Please review and provide feedbacks/suggestions.
>>> >
>>> > Thanks,
>>> > George
>>> >
>>> >
>>> > On Saturday, December 23, 2017, 0:51:13 GMT, Jun Rao 
>>> wrote:
>>> >
>>> > Hi, Tom,
>>>
>>> Thanks for the reply.
>>>
>>> 10. That's a good thought. Perhaps it's better to get rid of
>>> /admin/reassignment_requests
>>> too. The window when a controller is not available is small. So, we can
>>> just failed the admin client if the controller is not reachable after the
>>> timeout.
>>>
>>> 13. With the changes in 10, the old approach is handled through ZK
>>> callback
>>> and the new approach is through Kafka RPC. The ordering between the two
>>> is
>>> kind of arbitrary. Perhaps the ordering can just be based on the order
>>> that
>>> the reassignment is added to the controller request queue. From there, we
>>> can either do the overriding or the prevention.
>>>
>>> Jun
>>>
>>>
>>> On Fri, Dec 22, 2017 at 7:31 AM, Tom Bentley 
>>> wrote:
>>>
>>> > Hi Jun,
>>> >
>>> > Thanks for responding, my replies are inline:
>>> >
>>> > 10. You explanation makes sense. My remaining concern is the
>>> additional ZK
>>> > > writes in the proposal. With the proposal, we will need to do
>>> following
>>> > > writes in ZK.
>>> > >
>>> > > a. write new assignment in /admin/reassignment_requests
>>> > >
>>> > > b. write new assignment and additional metadata in
>>> > > /admin/reassignments/$topic/$partition
>>> > >
>>> > > c. write old + new assignment  in /brokers/topics/[topic]
>>> > >
>>> > > d. write new assignment in /brokers/topics/[topic]
>>> > >
>>> > > e. delete /admin/reassignments/$topic/$partition
>>> > >
>>> > > So, there are quite a few ZK writes. I am wondering if it's better to
>>> > > consolidate the info in /admin/reassignments/$topic/$partition into
>>> > > /brokers/topics/[topic].
>>> > > For example, we can just add some new JSON fields in
>>> > > /brokers/topics/[topic]
>>> > > to remember the new assignment and potentially the original replica
>>> count
>>> > > when doing step c. Those fields with then be removed in step d. That
>>> way,
>>> > > we can get rid of step b and e, saving 2 ZK writes per partition.
>>> > >
>>> >
>>> > This seems like a great idea to me.
>>> >
>>> > It might also be possible to get rid of the
>>> /admin/reassignment_requests
>>> > subtree too. I've not yet published the ideas I have for the
>>> AdminClient
>>> > API for reassigning partitions, but given the existence of such an
>>> API, the
>>> > route to starting a reassignment would be the AdminClient, and not
>>> > zookeeper. In that case there is no need for
>>> /admin/reassignment_requests
>>> > at all. The only drawback that I can see is that while it's currently
>>> > possible to trigger a reassignment even during a controller
>>> > election/failover that would no longer be the case if all requests had
>>> to
>>> 

Re: [DISCUSS] KIP-432: Additional Broker-Side Opt-In for Default, Unsecure SASL/OAUTHBEARER Implementation

2019-02-22 Thread Stanislav Kozlovski
That's an interesting approach - it looks good to me at first glance.
Should we consider creating a new KIP to initiate a proper discussion?
People might miss the discussion here due to the KIP title

On Thu, Feb 21, 2019 at 6:52 PM Ron Dagostino  wrote:

> My gut says a building block that takes a configuration as input and
> exposes a representation of the potential issues found would be a good
> starting point.  Then this could be leveraged as part of a command line
> tool that emits the data to stdout, or it could be used during broker
> startup such that any potential issues found are logged (or, more
> aggressively, cause the broker to not start if that is desired).  The
> return could be something as simple as an
> EnumSet:
>
> enum  PotentialConfigSecurityIssue {
> OAUTHBEARER_UNSECURE_TOKENS_ALLOWED, // when OAUTHBEARER is in
> sasl.enabled.mechanisms but server callback handler is the unsecured
> validator
> PLAINTEXT_LISTENER, // if there is a PLAINTEXT listener
> SSL_CLIENT_AUTH_NOT_REQUIRED, // when SSL listener is enabled but
> ssl.client.auth is not required
> ETC
> }
>
> Or it could be a Set of instances where each instance implements an
> interface that returns one of these enum values via a getType() method;
> this allows each instance to potentially be a different class and hold
> additional information (like the PLAINTEXT listener's address in case that
> is something that requires additional validation, though how the validation
> would be plugged in is a separate issue that I can't figure out at the
> moment).
>
> It feels to me that the EnumSet<> approach is the simplest place to start,
> and it might be best to allow anything more complicated to fall out as
> experience with the simple starting point builds.  Making the EnumSet<>
> part of the public API would then allow anyone to build upon it as they see
> fit.
>
> Ron
>
>
>
> On Thu, Feb 21, 2019 at 12:57 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > I think that is a solid idea. The closest thing I can think of is David's
> > PR about duplicate config key logging -
> > https://github.com/apache/kafka/pull/6104
> >
> > We could either continue the pattern of checking on broker startup and
> > logging a warning or create a separate tool that analyzes the configs.
> >
> > On Thu, Feb 21, 2019 at 3:16 PM Rajini Sivaram 
> > wrote:
> >
> > > Hi Ron,
> > >
> > > Yes, a security sanity check tool could be quite useful. Let's see what
> > > others think.
> > >
> > > Thanks,
> > >
> > > Rajini
> > >
> > >
> > > On Thu, Feb 21, 2019 at 1:49 PM Ron Dagostino 
> wrote:
> > >
> > > > HI Rajini and Stan.  Thanks for the feedback.
> > > >
> > > > Stan, regarding the proposed config name, I couldn't think of
> anything
> > > so I
> > > > just threw in something outrageous in the hopes that it would give a
> > > sense
> > > > of what I was talking about while perhaps making folks chuckle a bit.
> > > >
> > > > Rajini, I definitely see your point.  It probably doesn't make sense
> to
> > > > address this one particular issue (if we can even consider it an
> issue)
> > > > when in fact it is part of a pattern that has been explicitly agreed
> > upon
> > > > as being appropriate.
> > > >
> > > > Maybe a security sanity check tool that scans the config and flags
> any
> > of
> > > > these items you mentioned, plus the OAUTHBEARER one and any others we
> > can
> > > > think of, would be useful?  That way the out-of-the-box experience
> can
> > > > remain straightforward while some of the security risk that comes as
> a
> > > > byproduct can be mitigated.
> > > >
> > > > Ron
> > > >
> > > > Ron
> > > >
> > > > On Thu, Feb 21, 2019 at 8:02 AM Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for the KIP. How is this different from other scenarios:
> > > > >
> > > > >1. Our default is to use a PLAINTEXT listener. If you forget to
> > > change
> > > > >that, anyone has access to your cluster
> > > > >2. You may add a PLAINTEXT listener to the list of listeners in
> > > > >production. May be you intended it for an interface that was
> > > protected
> > > > >using network segmentation, but entered the wrong address.
> > > > >3. You are very security conscious and add an SSL listener. You
> > must
> > > > be
> > > > >secure now right? Our default is `ssl.client.auth=none`, which
> > means
> > > > any
> > > > >one can connect.
> > > > >4. You use the built-in insecure PLAIN callback that stores
> > > cleartext
> > > > >passwords on the file system. Or enable SASL/PLAIN without SSL.
> > > > >
> > > > > At the moment, our defaults are intended to make it easy to get
> > started
> > > > > quickly. If we want to make brokers secure by default, we need an
> > > > approach
> > > > > that works across the board. I am not sure we have a specific issue
> > > with
> > > > > OAUTHBEARER apart from the fact that we don't provide a s

Re: [DISCUSS] KIP-434: Add Replica Fetcher and Log Cleaner Count Metrics

2019-02-22 Thread Stanislav Kozlovski
Hey Viktor,

First off, thanks for the KIP! I think that it is almost always a good idea
to have more metrics. Observability never hurts.

In regards to the LogCleaner:
* Do we need to know log-cleaner-thread-count? That should always be equal
to "log.cleaner.threads" if I'm not mistaken.
* log-cleaner-current-live-thread-rate -  We already have the
"time-since-last-run-ms" metric which can let you know if something is
wrong with the log cleaning
As you said, we would like to have these two new metrics in order to
understand when a partial failure has happened - e.g only 1/3 log cleaner
threads are alive. I'm wondering if it may make more sense to either:
a) restart the threads when they die
b) add a metric which shows the dead thread count. You should probably
always have a low-level alert in the case that any threads have died

We had discussed a similar topic about thread revival and metrics in
KIP-346. Have you had a chance to look over that discussion? Here is the
mailing discussion for that -
http://mail-archives.apache.org/mod_mbox/kafka-dev/201807.mbox/%3ccanzzngyr_22go9swl67hedcm90xhvpyfzy_tezhz1mrizqk...@mail.gmail.com%3E

Best,
Stanislav



On Fri, Feb 22, 2019 at 11:18 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi All,
>
> I'd like to start a discussion about exposing count gauge metrics for the
> replica fetcher and log cleaner thread counts. It isn't a long KIP and the
> motivation is very simple: monitoring the thread counts in these cases
> would help with the investigation of various issues and might help in
> preventing more serious issues when a broker is in a bad state. Such a
> scenario that we seen with users is that their disk fills up as the log
> cleaner died for some reason and couldn't recover (like log corruption). In
> this case an early warning would help in the root cause analysis process as
> well as enable detecting and resolving the problem early on.
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
>
> I'd be happy to receive any feedback on this.
>
> Regards,
> Viktor
>


-- 
Best,
Stanislav


Re: [DISCUSS] KIP-435: Incremental Partition Reassignment

2019-02-22 Thread Harsha
Hi Viktor,
There is already KIP-236 for the same feature and George made a PR 
for this as well.
Lets consolidate these two discussions. If you have any cases that are not 
being solved by KIP-236 can you please mention them in that thread. We can 
address as part of KIP-236.

Thanks,
Harsha

On Fri, Feb 22, 2019, at 5:44 AM, Viktor Somogyi-Vass wrote:
> Hi Folks,
> 
> I've created a KIP about an improvement of the reassignment algorithm we
> have. It aims to enable partition-wise incremental reassignment. The
> motivation for this is to avoid excess load that the current replication
> algorithm implicitly carries as in that case there are points in the
> algorithm where both the new and old replica set could be online and
> replicating which puts double (or almost double) pressure on the brokers
> which could cause problems.
> Instead my proposal would slice this up into several steps where each step
> is calculated based on the final target replicas and the current replica
> assignment taking into account scenarios where brokers could be offline and
> when there are not enough replicas to fulfil the min.insync.replica
> requirement.
> 
> The link to the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment
> 
> I'd be happy to receive any feedback.
> 
> An important note is that this KIP and another one, KIP-236 that is 
> about
> interruptible reassignment (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment)
> should be compatible.
> 
> Thanks,
> Viktor
>


[jira] [Resolved] (KAFKA-7864) AdminZkClient.validateTopicCreate() should validate that partitions are 0-based

2019-02-22 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-7864.

   Resolution: Fixed
Fix Version/s: 2.3.0

Merged the PR to trunk.

> AdminZkClient.validateTopicCreate() should validate that partitions are 
> 0-based
> ---
>
> Key: KAFKA-7864
> URL: https://issues.apache.org/jira/browse/KAFKA-7864
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: Ryan
>Priority: Major
>  Labels: newbie
> Fix For: 2.3.0
>
>
> AdminZkClient.validateTopicCreate() currently doesn't validate that partition 
> ids in a topic are consecutive, starting from 0. The client code depends on 
> that. So, it would be useful to tighten up the check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7492) Explain `null` handling for reduce and aggregate

2019-02-22 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-7492.

Resolution: Fixed

Thanks asutosh for the contribution!

> Explain `null` handling for reduce and aggregate
> 
>
> Key: KAFKA-7492
> URL: https://issues.apache.org/jira/browse/KAFKA-7492
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation, streams
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Priority: Minor
>  Labels: beginner, docs, newbie
>
> We currently don't explain how records with `null` value are handled in 
> reduce and aggregate. In particular, what happens when the users' 
> aggregation/reduce `apply()` implementation returns `null`.
> We should update the JavaDocs accordingly and maybe also update the docs on 
> the web page.
> Cf. 
> https://stackoverflow.com/questions/52692202/what-happens-if-the-aggregator-of-a-kgroupedstream-returns-null



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-236 Interruptible Partition Reassignment

2019-02-22 Thread George Li
 Hi Viktor, 

Thanks for reading and provide feedbacks on KIP-236. 

For reassignments, one can generate a json for new assignments and another json 
with "original" assignments for rollback purpose.  In production cluster, from 
our experience, we need to submit the reassignments in batches with 
throttle/staggering to minimize the impact to the cluster.  Some large 
topic/partition couple with throttle can take pretty long time for the new 
replica to be in ISR to complete reassignment in that batch. Currently during 
this,  Kafka does not allow cancelling the pending reassignments cleanly.  Even 
you have the json with the "original" assignments to rollback, it has to wait 
till current reassignment to complete, then submit it as reassignments to 
rollback. If the current reassignment is causing impact to production, we would 
like the reassignments to be cancelled/rollbacked cleanly/safely/quickly.  This 
is the main goal of KIP-236. 

The original KIP-236 by Tom Bentley also proposed the incremental 
reassignments, to submit new reassignments while the current reassignments is 
still going on. This is scaled back to put under "Planned Future Changes" 
section of KIP-236, so we can expedite this Reassignment Cancellation/Rollback 
feature out to the community. 

The main idea incremental reassignment is to allow submit new reassignments in 
another ZK node /admin/reassign_partitions_queue  and merge it with current 
pending reassignments in /admin/reassign_partitions.  In case of same 
topic/partition in both ZK node, the conflict resolution is to cancel the 
current reassignment in /admin/reassign_partitions, and move the same 
topic/partition from /admin/reassign_partitions_queue  as new reassignment.

If there is enough interest from the community, this "Planned Future Changes" 
for incremental reassignments can also be delivered in KIP-236, otherwise, 
another KIP.  The current PR:   https://github.com/apache/kafka/pull/6296  only 
focuses/addresses the pending Reassignment Cancellation/Rollback. 

Hope this answers your questions. 

Thanks,George

On Friday, February 22, 2019, 6:51:14 AM PST, Viktor Somogyi-Vass 
 wrote:  
 
 Read through the KIP and I have one comment:

It seems like it is not looking strictly for cancellation but also
implements rolling back to the original. I think it'd be much simpler to
generate a reassignment json on cancellation that contains the original
assignment and start a new partition reassignment completely. This way the
reassignment algorithm (whatever it is) could be reused as a whole. Did you
consider this or are there any obstacles that prevents doing this?

Regards,
Viktor

On Fri, Feb 22, 2019 at 2:24 PM Viktor Somogyi-Vass 
wrote:

> I've published the above mentioned KIP here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment
> Will start a discussion about it soon.
>
> On Fri, Feb 22, 2019 at 12:45 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I also have a pending active work on the incremental partition
>> reassignment stuff here: https://issues.apache.org/jira/browse/KAFKA-6794
>> I think it would be good to cooperate on this to make both work
>> compatible with each other.
>>
>> I'll write up a KIP about this today so it'll be easier to see how to fit
>> the two together. Basically in my work I operate on the
>> /admin/reassign_partitions node on a fully compatible way, meaning I won't
>> change it just calculate each increment based on that and the current state
>> of the ISR set for the partition in reassignment.
>> I hope we could collaborate on this.
>>
>> Viktor
>>
>> On Thu, Feb 21, 2019 at 9:04 PM Harsha  wrote:
>>
>>> Thanks George. LGTM.
>>> Jun & Tom, Can you please take a look at the updated KIP.
>>> Thanks,
>>> Harsha
>>>
>>> On Wed, Feb 20, 2019, at 12:18 PM, George Li wrote:
>>> > Hi,
>>> >
>>> > After discussing with Tom, Harsha and I are picking up KIP-236 <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment>.
>>> The work focused on safely/cleanly cancel / rollback pending reassignments
>>> in a timely fashion. Pull Request #6296 <
>>> https://github.com/apache/kafka/pull/6296> Still working on more
>>> integration/system tests.
>>> >
>>> > Please review and provide feedbacks/suggestions.
>>> >
>>> > Thanks,
>>> > George
>>> >
>>> >
>>> > On Saturday, December 23, 2017, 0:51:13 GMT, Jun Rao 
>>> wrote:
>>> >
>>> > Hi, Tom,
>>>
>>> Thanks for the reply.
>>>
>>> 10. That's a good thought. Perhaps it's better to get rid of
>>> /admin/reassignment_requests
>>> too. The window when a controller is not available is small. So, we can
>>> just failed the admin client if the controller is not reachable after the
>>> timeout.
>>>
>>> 13. With the changes in 10, the old approach is handled through ZK
>>> callback
>>> and the new approach is through Kafka RPC. The ordering between the two
>>> is
>>> kind of arbitr

[jira] [Created] (KAFKA-7983) supporting replication.throttled.replicas in dynamic broker configuration

2019-02-22 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-7983:
--

 Summary: supporting replication.throttled.replicas in dynamic 
broker configuration
 Key: KAFKA-7983
 URL: https://issues.apache.org/jira/browse/KAFKA-7983
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Jun Rao


In 
[KIP-226|https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration#KIP-226-DynamicBrokerConfiguration-DefaultTopicconfigs],
 we added the support to change broker defaults dynamically. However, it didn't 
support changing leader.replication.throttled.replicas and 
follower.replication.throttled.replicas. These 2 configs were introduced in 
[KIP-73|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
 and controls the set of topic partitions on which replication throttling will 
be engaged. One useful case is to be able to set a default value for both 
configs to * to allow throttling to be engaged for all topic partitions. 
Currently, the static default value for both configs are ignored for 
replication throttling, it would be useful to fix that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7984) Do not rebuild leader epochs on segments that do not support it

2019-02-22 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7984:
--

 Summary: Do not rebuild leader epochs on segments that do not 
support it
 Key: KAFKA-7984
 URL: https://issues.apache.org/jira/browse/KAFKA-7984
 Project: Kafka
  Issue Type: Bug
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


h3. Preface

https://issues.apache.org/jira/browse/KAFKA-7897 (logs would store some leader 
epochs even if they did not support them - this is essentially a regression 
from https://issues.apache.org/jira/browse/KAFKA-7415)
https://issues.apache.org/jira/browse/KAFKA-7959

If users are running Kafka with 
https://issues.apache.org/jira/browse/KAFKA-7415 merged in, chances are they 
have sparsely-populated leader epoch cache files.
KAFKA-7897's implementation unintentionally handled the case of deletes those 
leader epoch cache files for versions 2.1+. For versions below, KAFKA-7959 
fixes that.

In any case, as it currently stands, a broker started up with a message format 
of `0.10.0` will have those leader epoch cache files deleted.


h3. Problem

We have logic [that rebuilds these leader epoch cache 
files|https://github.com/apache/kafka/blob/217f45ed554b34d5221e1dd3db76e4be892661cf/core/src/main/scala/kafka/log/Log.scala#L614]
 when recovering segments that do not have a clean shutdown file. It goes over 
the record batches and rebuilds the leader epoch.
KAFKA-7959's implementation guards against this by checking that the 
log.message.format supports it, *but* that issue is only merged for versions 
*below 2.1*.

Moreover, the case where `log.message.format >= 0.11` *is not handled*. If a 
broker has the following log segment file:
{code:java}
offset 0, format v2, epoch 1
offset 1, format v2, epoch 1
offset 2, format v1, no epoch
offset 3, format v1, no epoch
{code}
and gets upgraded to a new log message format that supports it, the rebuild of 
any logs that had an unclean shutdown will populate the leader epoch cache 
again, potentially resulting in the issue described in KAFKA-7897

One potential simple way to solve this is to clear the accumulated leader epoch 
cache when encountering a batch with no epoch upon segment rebuilding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-430 - Return Authorized Operations in Describe Responses

2019-02-22 Thread Colin McCabe
Hi Rajini,

Thanks for the KIP!

The KIP specifies that "Authorized operations will be returned as [an] INT8 
consistent with [the] AclOperation used in ACL requests and responses."  But 
there may be more than one AclOperation that is applied to a given resource.  
For example, a principal may have both READ and WRITE permission on a topic.

One option for representing this would be a bitfield.  A 32-bit bitfield could 
have the appropriate bits set.  For example, if READ and WRITE operations were 
permitted, bits 3 and 4 could be set.

Another thing to think about here is that certain AclOperations imply certain 
others.  For example, having WRITE on a topic gives you DESCRIBE on that topic 
as well automatically.  Does that mean that a topic with WRITE on it should 
automatically get DESCRIBE set in the bitfield?  I would argue that the answer 
is yes, for consistency's sake.

We will inevitably add new AclOperations over time, and we have to think about 
how to do this in a compatible way.  The simplest approach would be to just 
leave out the new AclOperations when a describe request comes in from an older 
version client.  This should be spelled out in the compatibility section.

best,
Colin


On Thu, Feb 21, 2019, at 02:28, Rajini Sivaram wrote:
> I would like to start vote on KIP-430 to optionally obtain authorized
> operations when describing resources:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses
> 
> Thank you,
> 
> Rajini
>


Re: [DISCUSSION] KIP-422: Add support for user/client configuration in the Kafka Admin Client

2019-02-22 Thread Colin McCabe
Hi Yaodong,

KIP-422 says that it would be good if "applications [could] leverage the 
unified KafkaAdminClient to manage their user/client configurations, instead of 
the direct dependency on Zookeeper."  But the KIP doesn't talk about any 
changes to KafkaAdminClient.  Instead, the only changes proposed are to 
AdminZKClient.  But  that is an internal class-- we don't need a KIP to change 
it, and it's not a public API that users can use.

I realize that the naming might be a bit confusing, but kafka.zk.AdminZKClient 
and kafka.admin.AdminClient are internal classes.  As the JavaDoc says, 
kafka.admin.AdminClient is deprecated as well.  The public class that we would 
be adding new methods to is org.apache.kafka.clients.admin.AdminClient.

best,
Colin

On Tue, Feb 19, 2019, at 15:21, Yaodong Yang wrote:
> Hello Jun, Viktor, Snoke and Stan,
> 
> Thanks for taking time to look at this KIP-422! For some reason, this email
> was put in my spam folder. Sorry about that.
> 
> Jun is right, the main motivation for this KIP-422 is to allow users to
> config user/clientId quota through AdminClient. In addition, this KIP-422
> also allows users to set or update any config related to a user or clientId
> entity if needed in the future.
> 
> For the KIP-257, I agree with Jun that we should add support for it. I will
> look at the current implementation and update the KIP-422 with new change.
> 
> I will ping this thread once I updated the KIP.
> 
> Thanks again!
> Yaodong
> 
> On Fri, Feb 15, 2019 at 1:28 AM Viktor Somogyi-Vass 
> wrote:
> 
> > Hi Guys,
> >
> > I wanted to reject that KIP, split it up and revamp it as in the meantime
> > there were some overlapping works I just didn't get to it due to other
> > higher priority work.
> > One of the splitted KIPs would have been the quota part of that and I'd be
> > happy if that lived in this KIP if Yaodong thinks it's worth to
> > incorporate. I'd be also happy to rebase that wire protocol and contribute
> > it to this KIP.
> >
> > Viktor
> >
> > On Wed, Feb 13, 2019 at 7:14 PM Jun Rao  wrote:
> >
> > > Hi, Yaodong,
> > >
> > > Thanks for the KIP. As Stan mentioned earlier, it seems that this is
> > > mostly covered by KIP-248, which was originally proposed by Victor.
> > >
> > > Hi, Victor,
> > >
> > > Do you still plan to work on KIP-248? It seems that you already got
> > pretty
> > > far on that. If not, would you mind letting Yaodong take over this?
> > >
> > > For both KIP-248 and KIP-422, one thing that I found missing is the
> > > support for customized quota (
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-257+-+Configurable+Quota+Management
> > ).
> > > With KIP-257, it's possible for one to construct a customized quota
> > defined
> > > through a map of metric tags. It would be useful to support that in the
> > > AdminClient API and the wire protocol.
> > >
> > > Hi, Sonke,
> > >
> > > I think the proposal is to support the user/clientId level quota through
> > > an AdminClient api. The user can be obtained from any existing
> > > authentication mechanisms.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Feb 7, 2019 at 5:59 AM Sönke Liebau
> > >  wrote:
> > >
> > >> Hi Yaodong,
> > >>
> > >> thanks for the KIP!
> > >>
> > >> If I understand your intentions correctly then this KIP would only
> > >> address a fairly specific use case, namely SASL-PLAIN with users
> > >> defined in Zookeeper. For all other authentication mechanisms like
> > >> SSL, SASL-GSSAPI or SASL-PLAIN with users defined in jaas files I
> > >> don't see how the AdminClient could directly create new users.
> > >> Is this correct, or am I missing something?
> > >>
> > >> Best regards,
> > >> Sönke
> > >>
> > >> On Thu, Feb 7, 2019 at 2:47 PM Stanislav Kozlovski
> > >>  wrote:
> > >> >
> > >> > This KIP seems to duplicate some of the functionality proposed in
> > >> KIP-248
> > >> > <
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-248+-+Create+New+ConfigCommand+That+Uses+The+New+AdminClient
> > >> >.
> > >> > KIP-248 has been stuck in a vote thread since July 2018.
> > >> >
> > >> > Viktor, do you plan on working on the KIP?
> > >> >
> > >> > On Thu, Feb 7, 2019 at 1:27 PM Stanislav Kozlovski <
> > >> stanis...@confluent.io>
> > >> > wrote:
> > >> >
> > >> > > Hey there Yaodong, thanks for the KIP!
> > >> > >
> > >> > > I'm not too familiar with the user/client configurations we
> > currently
> > >> > > allow, is there a KIP describing the initial feature? If there is,
> > it
> > >> would
> > >> > > be useful to include in KIP-422.
> > >> > >
> > >> > > I also didn't see any authorization in the PR, have we thought about
> > >> > > needing to authorize the alter/describe requests per the
> > user/client?
> > >> > >
> > >> > > Thanks,
> > >> > > Stanislav
> > >> > >
> > >> > > On Fri, Jan 25, 2019 at 5:47 PM Yaodong Yang <
> > yangyaodon...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > >> Hi folks,
> > >> > >>
> > >> > >> I've published K

[jira] [Created] (KAFKA-7985) Cleanup AssignedTasks / AbstractTask logic

2019-02-22 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-7985:


 Summary: Cleanup AssignedTasks / AbstractTask logic
 Key: KAFKA-7985
 URL: https://issues.apache.org/jira/browse/KAFKA-7985
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang


Today the life time of a task is:

created -> [initializeStateStores] -> 
restoring (writes to the initialized state stores) -> [initializeTopology] -> 
running -> [closeTopology] -> 
suspended -> [closeStateManager] -> 
dead

And hence the assigned tasks contains the following non-overlapping sets : 
created, restoring, running, suspended, (dead tasks do no need to be 
maintained). Normally `created` should be empty since once a task is created it 
should move on transit to either restoring or running immediately. So whenever 
we are suspending tasks, we should go through these sets and act accordingly:

1. `created` and `suspended`: just check these two sets are always empty.
2. `running`: transit to `suspended`.
3. `restoring`: transite to `suspended`. But the difference here is that we do 
not need to close topology since it was not created yet at all; we just need to 
remember the restored position, and keep the restorers on hold instead of 
clearing all of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3408

2019-02-22 Thread Apache Jenkins Server
See 


Changes:

[bbejeck] KAFKA-7492 : Updated javadocs for aggregate and reduce methods 
returning

[junrao] KAFKA-7864; validate partitions are 0-based (#6246)

--
[...truncated 2.31 MB...]
org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaSimple PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaAggregate STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaAggregate PASSED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaTransform STARTED

org.apache.kafka.streams.scala.TopologyTest > 
shouldBuildIdenticalTopologyInJavaNScalaTransform PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsMaterialized 
STARTED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsMaterialized 
PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsJava STARTED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWordsJava PASSED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWords STARTED

org.apache.kafka.streams.scala.WordCountTest > testShouldCountWords PASSED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialized 
should create a Materialized with Serdes STARTED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialized 
should create a Materialized with Serdes PASSED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a store name should create a Materialized with Serdes and a store name 
STARTED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a store name should create a Materialized with Serdes and a store name 
PASSED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a window store supplier should create a Materialized with Serdes and a 
store supplier STARTED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a window store supplier should create a Materialized with Serdes and a 
store supplier PASSED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a key value store supplier should create a Materialized with Serdes and a 
store supplier STARTED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a key value store supplier should create a Materialized with Serdes and a 
store supplier PASSED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a session store supplier should create a Materialized with Serdes and a 
store supplier STARTED

org.apache.kafka.streams.scala.kstream.MaterializedTest > Create a Materialize 
with a session store supplier should create a Materialized with Serdes and a 
store supplier PASSED

org.apache.kafka.streams.scala.kstream.KTableTest > filter a KTable should 
filter records satisfying the predicate STARTED

org.apache.kafka.streams.scala.kstream.KTableTest > filter a KTable should 
filter records satisfying the predicate PASSED

org.apache.kafka.streams.scala.kstream.KTableTest > filterNot a KTable should 
filter records not satisfying the predicate STARTED

org.apache.kafka.streams.scala.kstream.KTableTest > filterNot a KTable should 
filter records not satisfying the predicate PASSED

org.apache.kafka.streams.scala.kstream.KTableTest > join 2 KTables should join 
correctly records STARTED

org.apache.kafka.streams.scala.kstream.KTableTest > join 2 KTables should join 
correctly records PASSED

org.apache.kafka.streams.scala.kstream.KTableTest > join 2 KTables with a 
Materialized should join correctly records and state store STARTED

org.apache.kafka.streams.scala.kstream.KTableTest > join 2 KTables with a 
Materialized should join correctly records and state store PASSED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped should 
create a Grouped with Serdes STARTED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped should 
create a Grouped with Serdes PASSED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped with 
repartition topic name should create a Grouped with Serdes, and repartition 
topic name STARTED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped with 
repartition topic name should create a Grouped with Serdes, and repartition 
topic name PASSED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced should 
create a Produced with Serdes STARTED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced should 
create a Produced with Serdes PASSED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced with 
timestampExtractor and resetPolicy should create a Consumed with Serdes, 
timestam

Re: [VOTE] KIP-430 - Return Authorized Operations in Describe Responses

2019-02-22 Thread Rajini Sivaram
Hi Colin,

Thanks for the review. Sorry I meant that an array of INT8's, each of which
is an AclOperation code will be returned. I have clarified that in the KIP.

All permitted operations will be returned from the set of supported
operations on each resource. This is regardless of whether the access was
implicitly or explicitly granted. Have clarified that in the KIP.

Since the values returned are INT8 codes, clients can simply ignore any
they don't recognize. Java clients convert these into AclOperation.UNKNOWN.
That way we don't need to update Metadata/describe request versions when
new operations are added to a resource. This is consistent with
DescribeAcls behaviour. Have added this to the compatibility section of the
KIP.

Thank you,

Rajini



On Fri, Feb 22, 2019 at 6:46 PM Colin McCabe  wrote:

> Hi Rajini,
>
> Thanks for the KIP!
>
> The KIP specifies that "Authorized operations will be returned as [an]
> INT8 consistent with [the] AclOperation used in ACL requests and
> responses."  But there may be more than one AclOperation that is applied to
> a given resource.  For example, a principal may have both READ and WRITE
> permission on a topic.
>
> One option for representing this would be a bitfield.  A 32-bit bitfield
> could have the appropriate bits set.  For example, if READ and WRITE
> operations were permitted, bits 3 and 4 could be set.
>
> Another thing to think about here is that certain AclOperations imply
> certain others.  For example, having WRITE on a topic gives you DESCRIBE on
> that topic as well automatically.  Does that mean that a topic with WRITE
> on it should automatically get DESCRIBE set in the bitfield?  I would argue
> that the answer is yes, for consistency's sake.
>
> We will inevitably add new AclOperations over time, and we have to think
> about how to do this in a compatible way.  The simplest approach would be
> to just leave out the new AclOperations when a describe request comes in
> from an older version client.  This should be spelled out in the
> compatibility section.
>
> best,
> Colin
>
>
> On Thu, Feb 21, 2019, at 02:28, Rajini Sivaram wrote:
> > I would like to start vote on KIP-430 to optionally obtain authorized
> > operations when describing resources:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses
> >
> > Thank you,
> >
> > Rajini
> >
>


[jira] [Created] (KAFKA-7987) a broker's ZK session may die on transient auth failure

2019-02-22 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-7987:
--

 Summary: a broker's ZK session may die on transient auth failure
 Key: KAFKA-7987
 URL: https://issues.apache.org/jira/browse/KAFKA-7987
 Project: Kafka
  Issue Type: Improvement
Reporter: Jun Rao


After a transient network issue, we saw the following log in a broker.
{code:java}
[23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: 
javax.security.sasl.SaslException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Server not found in Kerberos database (7))]) occurred when 
evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will 
go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
[23:37:02,102] ERROR [ZooKeeperClient] Auth failed. 
(kafka.zookeeper.ZooKeeperClient)
{code}
The network issue prevented the broker from communicating to ZK. The broker's 
ZK session then expired, but the broker didn't know that yet since it couldn't 
establish a connection to ZK. When the network was back, the broker tried to 
establish a connection to ZK, but failed due to auth failure (likely due to a 
transient KDC issue). The current logic just ignores the auth failure without 
trying to create a new ZK session. Then the broker will be permanently in a 
state that it's alive, but not registered in ZK.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7986) distinguish the logging from different ZooKeeperClient instances

2019-02-22 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-7986:
--

 Summary: distinguish the logging from different ZooKeeperClient 
instances
 Key: KAFKA-7986
 URL: https://issues.apache.org/jira/browse/KAFKA-7986
 Project: Kafka
  Issue Type: Improvement
Reporter: Jun Rao


It's possible for each broker to have more than 1 ZooKeeperClient instance. For 
example, SimpleAclAuthorizer creates a separate ZooKeeperClient instance when 
configured. It would be useful to distinguish the logging from different 
ZooKeeperClient instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-trunk-jdk11 #308

2019-02-22 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk8 #3409

2019-02-22 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-7672: Restoring tasks need to be closed upon task suspension

--
[...truncated 2.31 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache

[DISCUSS] KIP-433: Provide client API version to authorizer

2019-02-22 Thread Ying Zheng



Re: [DISCUSS] KIP-433: Provide client API version to authorizer

2019-02-22 Thread Gwen Shapira
Link, for convenience:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-433%3A+Provide+client+API+version+to+authorizer

I actually prefer the first rejected alternative (add a
configuration). While you are right that configuration is inherently
less flexible, putting the logic in the authorizer means that an admin
that wants to limit the allowed client API versions has to implement
an authorizer. This is more challenging than changing a config (and
AFAIK, can't be done dynamically - configs can be dynamic and the
admin can avoid a restart).

Would be interested to hear what others think.

Gwen

On Fri, Feb 22, 2019 at 2:11 PM Ying Zheng  wrote:
>
>


-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-431: Support of printing additional ConsumerRecord fields in DefaultMessageFormatter

2019-02-22 Thread Gwen Shapira
Love it. Thanks.

On Mon, Feb 18, 2019 at 2:00 PM Mateusz Zakarczemny
 wrote:
>
> Hi all,
>
> I have created a KIP to support additional message fields in console
> consumer:
> KIP-431 - Support of printing additional ConsumerRecord fields in
> DefaultMessageFormatter
> 
>
> The main purpose of the proposed change is to allow printing message
> offset, partition and headers in console consumer. Changes are backward
> compatible and impact only console consumer parameters.
>
> PR: https://github.com/apache/kafka/pull/4807
> Jira ticket: KAFKA-6733 
>
> I'm waiting for your feedback.
>
> Regards,
> Mateusz Zakarczemny



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-433: Provide client API version to authorizer

2019-02-22 Thread Ying Zheng
Hi Gwen,

Thank you for the quick feedback!

It's a good point that broker configuration can be dynamic and is more
convenient. Technically, anything inside the authorizer can also be
dynamic. For example, the SimpleAclAuthorizer in Kafka stores ACLs in
Zookeeper, which can be dynamically changed with CLI.





On Fri, Feb 22, 2019 at 2:41 PM Gwen Shapira  wrote:

> Link, for convenience:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-433%3A+Provide+client+API+version+to+authorizer
>
> I actually prefer the first rejected alternative (add a
> configuration). While you are right that configuration is inherently
> less flexible, putting the logic in the authorizer means that an admin
> that wants to limit the allowed client API versions has to implement
> an authorizer. This is more challenging than changing a config (and
> AFAIK, can't be done dynamically - configs can be dynamic and the
> admin can avoid a restart).
>
> Would be interested to hear what others think.
>
> Gwen
>
> On Fri, Feb 22, 2019 at 2:11 PM Ying Zheng  wrote:
> >
> >
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [DISCUSSION] KIP-422: Add support for user/client configuration in the Kafka Admin Client

2019-02-22 Thread Yaodong Yang
Hi Colin,

CIL,

Thanks!
Yaodong


On Fri, Feb 22, 2019 at 10:56 AM Colin McCabe  wrote:

> Hi Yaodong,
>
> KIP-422 says that it would be good if "applications [could] leverage the
> unified KafkaAdminClient to manage their user/client configurations,
> instead of the direct dependency on Zookeeper."  But the KIP doesn't talk
> about any changes to KafkaAdminClient.  Instead, the only changes proposed
> are to AdminZKClient.  But  that is an internal class-- we don't need a KIP
> to change it, and it's not a public API that users can use.
>

Sorry for the confusion in the KIP. Actually there is no change to
AdminZKClient needed for this KIP, we just leverage them to configure the
properties in the ZK. You can find the details from this PR
https://github.com/apache/kafka/pull/6189

As you can see from the PR, we need the client side and server process
changes, so I feel like we still need the KIP for this change.


> I realize that the naming might be a bit confusing, but
> kafka.zk.AdminZKClient and kafka.admin.AdminClient are internal classes.
> As the JavaDoc says, kafka.admin.AdminClient is deprecated as well.  The
> public class that we would be adding new methods to is
> org.apache.kafka.clients.admin.AdminClient.
>

I agree. Thanks for pointing this out!


> best,
> Colin
>
> On Tue, Feb 19, 2019, at 15:21, Yaodong Yang wrote:
> > Hello Jun, Viktor, Snoke and Stan,
> >
> > Thanks for taking time to look at this KIP-422! For some reason, this
> email
> > was put in my spam folder. Sorry about that.
> >
> > Jun is right, the main motivation for this KIP-422 is to allow users to
> > config user/clientId quota through AdminClient. In addition, this KIP-422
> > also allows users to set or update any config related to a user or
> clientId
> > entity if needed in the future.
> >
> > For the KIP-257, I agree with Jun that we should add support for it. I
> will
> > look at the current implementation and update the KIP-422 with new
> change.
> >
> > I will ping this thread once I updated the KIP.
> >
> > Thanks again!
> > Yaodong
> >
> > On Fri, Feb 15, 2019 at 1:28 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hi Guys,
> > >
> > > I wanted to reject that KIP, split it up and revamp it as in the
> meantime
> > > there were some overlapping works I just didn't get to it due to other
> > > higher priority work.
> > > One of the splitted KIPs would have been the quota part of that and
> I'd be
> > > happy if that lived in this KIP if Yaodong thinks it's worth to
> > > incorporate. I'd be also happy to rebase that wire protocol and
> contribute
> > > it to this KIP.
> > >
> > > Viktor
> > >
> > > On Wed, Feb 13, 2019 at 7:14 PM Jun Rao  wrote:
> > >
> > > > Hi, Yaodong,
> > > >
> > > > Thanks for the KIP. As Stan mentioned earlier, it seems that this is
> > > > mostly covered by KIP-248, which was originally proposed by Victor.
> > > >
> > > > Hi, Victor,
> > > >
> > > > Do you still plan to work on KIP-248? It seems that you already got
> > > pretty
> > > > far on that. If not, would you mind letting Yaodong take over this?
> > > >
> > > > For both KIP-248 and KIP-422, one thing that I found missing is the
> > > > support for customized quota (
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-257+-+Configurable+Quota+Management
> > > ).
> > > > With KIP-257, it's possible for one to construct a customized quota
> > > defined
> > > > through a map of metric tags. It would be useful to support that in
> the
> > > > AdminClient API and the wire protocol.
> > > >
> > > > Hi, Sonke,
> > > >
> > > > I think the proposal is to support the user/clientId level quota
> through
> > > > an AdminClient api. The user can be obtained from any existing
> > > > authentication mechanisms.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Feb 7, 2019 at 5:59 AM Sönke Liebau
> > > >  wrote:
> > > >
> > > >> Hi Yaodong,
> > > >>
> > > >> thanks for the KIP!
> > > >>
> > > >> If I understand your intentions correctly then this KIP would only
> > > >> address a fairly specific use case, namely SASL-PLAIN with users
> > > >> defined in Zookeeper. For all other authentication mechanisms like
> > > >> SSL, SASL-GSSAPI or SASL-PLAIN with users defined in jaas files I
> > > >> don't see how the AdminClient could directly create new users.
> > > >> Is this correct, or am I missing something?
> > > >>
> > > >> Best regards,
> > > >> Sönke
> > > >>
> > > >> On Thu, Feb 7, 2019 at 2:47 PM Stanislav Kozlovski
> > > >>  wrote:
> > > >> >
> > > >> > This KIP seems to duplicate some of the functionality proposed in
> > > >> KIP-248
> > > >> > <
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-248+-+Create+New+ConfigCommand+That+Uses+The+New+AdminClient
> > > >> >.
> > > >> > KIP-248 has been stuck in a vote thread since July 2018.
> > > >> >
> > > >> > Viktor, do you plan on working on the KIP?
> > > >> >
> > > >> > On Thu, Feb 7, 2019 at 1:27 PM Stanislav Kozlo

[jira] [Resolved] (KAFKA-7959) Clear/delete epoch cache if old message format is in use

2019-02-22 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7959.

   Resolution: Fixed
Fix Version/s: 2.0.2

> Clear/delete epoch cache if old message format is in use
> 
>
> Key: KAFKA-7959
> URL: https://issues.apache.org/jira/browse/KAFKA-7959
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Stanislav Kozlovski
>Priority: Major
> Fix For: 2.0.2
>
>
> Because of KAFKA-7897, it is possible to have a sparse epoch cache when using 
> the old message format. The fix for that issue addresses the problem of 
> improper use of that cache while the message format remains on an older 
> version. However, it leaves the possibility of misuse during a message format 
> upgrade, which can cause unexpected truncation and re-replication. To fix the 
> problem, we should delete or at least clear the cache whenever the old 
> message format is used.
> Note that this problem was fixed unintentionally in 2.1 with the patch for 
> KAFKA-7897. This issue applies specifically to the 2.0 branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-2.2-jdk8 #29

2019-02-22 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-430 - Return Authorized Operations in Describe Responses

2019-02-22 Thread Colin McCabe
Hi Rajini,

Thanks for the explanations.

On Fri, Feb 22, 2019, at 11:59, Rajini Sivaram wrote:
> Hi Colin,
> 
> Thanks for the review. Sorry I meant that an array of INT8's, each of which
> is an AclOperation code will be returned. I have clarified that in the KIP.

Do you think it's worth considering a bitfield here still?  An array will take 
up at least 4 bytes for the length, plus whatever length the elements are.  A 
32-bit bitfield would pretty much always take up less space.  And we can have a 
new version of the RPC with 64 bits or whatever if we outgrow 32 operations.  
MetadataResponse for a big cluster could contain quite a lot of topics, tens or 
hundreds of thousands.  So the space savings could be considerable.

> 
> All permitted operations will be returned from the set of supported
> operations on each resource. This is regardless of whether the access was
> implicitly or explicitly granted. Have clarified that in the KIP.

Thanks.

> 
> Since the values returned are INT8 codes, clients can simply ignore any
> they don't recognize. Java clients convert these into AclOperation.UNKNOWN.
> That way we don't need to update Metadata/describe request versions when
> new operations are added to a resource. This is consistent with
> DescribeAcls behaviour. Have added this to the compatibility section of the
> KIP.

Displaying "unknown" for new AclOperations made sense for DescribeAcls, since 
the ACL is explicitly referencing the new AclOperation.   For example, if you 
upgrade your Kafka cluster to a new version that supports DESCRIBE_CONFIGS, 
your old ACLs still don't reference DESCRIBE_CONFIGS.

In contrast, in the case here, existing topics (or other resources) might pick 
up the new ACLOperation just by upgrading Kafka.  For example, if you had ALL 
permission on a topic and you upgrade to a new version with DESCRIBE_CONFIGS, 
you now have DESCRIBE_CONFIGS permission on that topic.  This would result in a 
lot of "unknowns" being displayed here, which might not be ideal.

Also, there is an argument from intent-- the intention here is to let you know 
what you can do with a resource that already exists.  Knowing that you can do 
an unknown thing isn't very useful.  In contrast, for DescribeAcls, knowing 
that an ACL references an operation your software is too old to understand is 
useful (you may choose not to modify that ACL, since you don't know what it 
does, for example.)  What do you think?

cheers,
Colin


> 
> Thank you,
> 
> Rajini
> 
> 
> 
> On Fri, Feb 22, 2019 at 6:46 PM Colin McCabe  wrote:
> 
> > Hi Rajini,
> >
> > Thanks for the KIP!
> >
> > The KIP specifies that "Authorized operations will be returned as [an]
> > INT8 consistent with [the] AclOperation used in ACL requests and
> > responses."  But there may be more than one AclOperation that is applied to
> > a given resource.  For example, a principal may have both READ and WRITE
> > permission on a topic.
> >
> > One option for representing this would be a bitfield.  A 32-bit bitfield
> > could have the appropriate bits set.  For example, if READ and WRITE
> > operations were permitted, bits 3 and 4 could be set.
> >
> > Another thing to think about here is that certain AclOperations imply
> > certain others.  For example, having WRITE on a topic gives you DESCRIBE on
> > that topic as well automatically.  Does that mean that a topic with WRITE
> > on it should automatically get DESCRIBE set in the bitfield?  I would argue
> > that the answer is yes, for consistency's sake.
> >
> > We will inevitably add new AclOperations over time, and we have to think
> > about how to do this in a compatible way.  The simplest approach would be
> > to just leave out the new AclOperations when a describe request comes in
> > from an older version client.  This should be spelled out in the
> > compatibility section.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Feb 21, 2019, at 02:28, Rajini Sivaram wrote:
> > > I would like to start vote on KIP-430 to optionally obtain authorized
> > > operations when describing resources:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses
> > >
> > > Thank you,
> > >
> > > Rajini
> > >
> >
>


Build failed in Jenkins: kafka-2.0-jdk8 #229

2019-02-22 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-7959; Delete leader epoch cache files with old message format

--
[...truncated 2.50 MB...]
org.apache.kafka.streams.TopologyTest > shouldNotAddNullStateStoreSupplier 
STARTED

org.apache.kafka.streams.TopologyTest > shouldNotAddNullStateStoreSupplier 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddStateStoreToNonExistingProcessor STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddStateStoreToNonExistingProcessor PASSED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
tableNamedMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
tableNamedMaterializedCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullStoreNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullStoreNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullNameWhenAddingSourceWithTopic STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullNameWhenAddingSourceWithTopic PASSED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorWithStateShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorWithStateShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > shouldNotAllowNullTopicWhenAddingSink 
STARTED

org.apache.kafka.streams.TopologyTest > shouldNotAllowNullTopicWhenAddingSink 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddGlobalStoreWithSourceNameEqualsProcessorName STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddGlobalStoreWithSourceNameEqualsProcessorName PASSED

org.apache.kafka.streams.TopologyTest > 
singleSourceWithListOfTopicsShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
singleSourceWithListOfTopicsShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
sourceWithMultipleProcessorsShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceWithMultipleProcessorsShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowZeroStoreNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowZeroStoreNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
kTableNamedMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableNamedMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
timeWindowZeroArgCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
timeWindowZeroArgCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
multipleSourcesWithProcessorsShouldHaveDistinctSubtopologies STARTED

org.apache.kafka.streams.TopologyTest > 
multipleSourcesWithProcessorsShouldHaveDistinctSubtopologies PASSED

org.apache.kafka.streams.TopologyTest > shouldThrowOnUnassignedStateStoreAccess 
STARTED

org.apache.kafka.streams.TopologyTest > shouldThrowOnUnassignedStateStoreAccess 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullProcessorNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullProcessorNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
kTableAnonymousMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableAnonymousMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
sessionWindowZeroArgCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
sessionWindowZeroArgCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
tableAnonymousMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
tableAnonymousMaterializedCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > shouldDescribeGlobalStoreTopology 
STARTED

org.apache.kafka.streams.TopologyTest > shouldDescribeGlobalStoreTopology PASSED

org.apache.kafka.streams.TopologyTest > 
kTableNonMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableNonMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
kGroupedStreamAnonymousMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTe

Jenkins build is back to normal : kafka-1.1-jdk7 #245

2019-02-22 Thread Apache Jenkins Server
See