Kafka SASL handshake takes too long

2018-07-10 Thread Majed Al Zayer
Hello everyone,

I hope that someone could help me with this issue. I have already posted
this on:

- StackOverflow:
https://stackoverflow.com/questions/51249835/kafka-sasl-handshake-takes-too-long

- Confluent Kafka .Net @ Github:
https://github.com/confluentinc/confluent-kafka-dotnet/issues/564

No answers yet.

===

- Description: authentication using SASL/SCRAM or SASL/PLAINTEXT takes
around 9 seconds to complete. Is this normal?

- How to reproduce:

-- One Kafka broker instance (v1.1.0)

-- One C# producer (Confluent Kafka Client v0.11.4) that does the following:

/* producer code - start */

var producerConfig =

   PropertiesUtils.ReadPropertiesFile("producer.properties");

   using (var producer = new Producer(producerConfig, null, new
StringSerializer(Encoding.UTF8)))

   {

   while (true)

   {

   Console.Write("message: ");

   string msg = Console.ReadLine();

   producer.ProduceAsync("test-topic", null, msg);

   }

   }

/* producer code - end */


-- One C# consumer (Confluent Kafka Client v0.11.4) that does the following:

/* consumer code - start */

var config = PropertiesUtils.ReadPropertiesFile("consumer.properties");


using (var consumer = new Consumer(config, null, new
StringDeserializer(Encoding.UTF8)))

{

consumer.OnMessage += (_, msg)

  =>

{

Console.WriteLine(msg.Value);

};


consumer.OnError += (_, error)

  => Console.WriteLine($"Error: {error}");


consumer.OnConsumeError += (_, msg)

  => Console.WriteLine($"Consume error
({msg.TopicPartitionOffset}): {msg.Error}");


consumer.Subscribe("test-topic");


while (true)

{

try

{

consumer.Poll(TimeSpan.FromMilliseconds(1000));

}

catch(Exception e)

{

Console.WriteLine(e.Message);

}

}

}

/* consumer code - end */

-- server.properties:

# server.properties - start #

broker.id=0

num.network.threads=3

num.io.threads=8


socket.send.buffer.bytes=102400

socket.receive.buffer.bytes=102400

socket.request.max.bytes=104857600

session.timeout.ms=1000


group.initial.rebalance.delay.ms=0


listeners=SASL_SSL://localhost:9093


ssl.keystore.type =JKS

ssl.keystore.location=...

ssl.keystore.password=...

ssl.key.password=...


ssl.truststore.type=JKS

ssl.truststore.location=...

ssl.truststore.password=...


ssl.protocol=TLS

ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1

ssl.client.auth=required

security.inter.broker.protocol=SASL_SSL

ssl.secure.random.implementation=SHA1PRNG


sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256

sasl.mechanism.inter.broker.protocol=PLAIN


log.dirs=...

num.partitions=1

num.recovery.threads.per.data.dir=1


offsets.topic.replication.factor=1

transaction.state.log.replication.factor=1

transaction.state.log.min.isr=1


log.retention.hours=168

log.retention.bytes=1073741824

log.segment.bytes=1073741824

log.retention.check.interval.ms=30

num.replica.fetchers=1


zookeeper.connect=localhost:2181

zookeeper.connection.timeout.ms=6000

group.initial.rebalance.delay.ms=0

# server.properties - end #

-- consumer.properties:
# consumer.properties - start #

bootstrap.servers=localhost:9093

group.id=test-consumer-group

fetch.min.bytes=1

fetch.wait.max.ms=1

auto.offset.reset=latest

socket.blocking.max.ms=1

fetch.error.backoff.ms=1

ssl.ca.location=...

ssl.certificate.location=...

ssl.key.location=...

ssl.key.password=..

security.protocol=SASL_SSL

sasl.mechanisms=PLAIN

sasl.username=...

sasl.password=...

# consumer.properties - end #


- producer.properties
# poducer.properties - start #

bootstrap.servers=localhost:9093

compression.type=none

linger.ms=0

retries=0 acks=0


ssl.ca.location=...

ssl.certificate.location=...

ssl.key.location=...

ssl.key.password=...


security.protocol=SASL_SSL

sasl.mechanisms=PLAIN

sasl.username=...

sasl.password=...

# poducer.properties - end #

-- Run the consumer. It takes approximately 9 seconds to finish the SASL
handshake from request to completion. Here's the log:

[2018-07-06 17:03:37,673] DEBUG Set SASL server state to
HANDSHAKE_OR_VERSIONS_REQUEST
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2018-07-06 17:03:37,673] DEBUG Handling Kafka request API_VERSIONS
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2018-07-06 17:03:37,673] DEBUG Set SASL server state to HANDSHAKE_REQUEST
(org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
[2018-07-06 17:03:37,673] DEBUG Handling Kafka request SASL_HANDSHAKE
(org.apache.kafka.com

[jira] [Created] (KAFKA-7145) Consumer thread getting stuck in hasNext() method

2018-07-10 Thread Lovenish goyal (JIRA)
Lovenish goyal created KAFKA-7145:
-

 Summary: Consumer thread getting stuck in hasNext() method
 Key: KAFKA-7145
 URL: https://issues.apache.org/jira/browse/KAFKA-7145
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.0.1, 0.9.0.1
Reporter: Lovenish goyal


Consumer thread is getting stuck at *hasNext()* method.

we are using ConsumerIterator for same and below is the code snipped 

 
{code:java}
ConsumerIterator mIterator;
List> streams = 
mConsumerConnector.createMessageStreamsByFilter(topicFilter);
KafkaStream stream = streams.get(0);
mIterator = stream.iterator();

{code}
 

When i manually check via [Kafdrop|https://github.com/HomeAdvisor/Kafdrop] I am 
seeing 'No message found' message.I have tried same with both kafka version 9 & 
10 and getting same issue. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSSION] KIP-336: Consolidate ExtendedSerializer/Serializer and ExtendedDeserializer/Deserializer

2018-07-10 Thread Viktor Somogyi
Hi Ismael,

Well, yes. If we care about headers only then you'd need to add a dummy
implementation for the 2 parameter method as well. Although it is not
ideal, we're using the Serializer interface everywhere and convert it to
extended with ensureExtended(serializer) and delegate to the 2 parameter
method inside the wrapper which is returned in ensureExtended. Because of
backward compatibility we have to keep delegating but I could perhaps add a
dummy implementation for the 2 parameter too if you and others think that
would be better. In this case though we'd have an interface where all
methods are default (given the improvements of KIP-331
)
and would have to rethink if this interface should be a
@FunctionalInterface.
I don't really have a context currently on how the 3 parameter method is
used, most the code samples I found on github were using the 2 parameter
method. I think I found one instance where the 3 parameter one was used but
that delegated to the 2 param one :). Have to say though that this research
is not representative.
All in all I think it wouldn't hurt to provide a default implementation for
the 2 param method too but then we have to give up the @FunctionalInterface
annotation and we'll end up with an interface with no abstract methods but
only defaults.
What do you think?

Cheers,
Viktor


On Mon, Jul 9, 2018 at 11:02 AM Ismael Juma  wrote:

> Thanks for the KIP. It would be helpful to understand the user experience
> for the case where the implementor uses the headers. It seems like it would
> require overriding two methods?
>
> Ismael
>
> On Mon, Jul 9, 2018 at 1:50 AM Viktor Somogyi 
> wrote:
>
> > Hi folks,
> >
> > I've published KIP-336 which is about consolidating the
> > Serializer/Deserializer interfaces.
> >
> > Basically the story here is when ExtendedSerializer and
> > ExtendedDeserializer were added we still supported Java 7 and therefore
> had
> > to use compatible constructs which now seem unnecessary since we dropped
> > support for Java 7. Now in this KIP I propose a way to deprecate those
> > patterns:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87298242
> >
> > I'd be happy to receive some feedback about the KIP I published.
> >
> > Cheers,
> > Viktor
> >
>


Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

2018-07-10 Thread Bobby Evans
I there any update on this.  The performance improvements are quite
impressive and I really would like to stop forking kafka just to get this
in.

Thanks,

Bobby

On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee  wrote:

> Ismael,
>
> Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
> your time. I am also working at another issue now. Thank you for letting me
> know.
>
> Best,
> Dongjin
>
> On Wed, Jun 13, 2018, 11:44 PM Ismael Juma  wrote:
>
> > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This KIP
> > seems like a great candidate for 2.1.0 and hopefully there will be more
> of
> > a discussion next week. :)
> >
> > Ismael
> >
> > On Wed, 13 Jun 2018, 05:17 Dongjin Lee,  wrote:
> >
> > > Hello. I just updated my draft implementation:
> > >
> > > 1. Rebased to latest trunk (commit 5145d6b)
> > > 2. Apply ZStd 1.3.4
> > >
> > > You can check out the implementation from here
> > > . If you experience any
> > problem
> > > running it, don't hesitate to give me a mention.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee 
> wrote:
> > >
> > > > Here is the short conclusion about the license problem: *We can use
> > zstd
> > > > and zstd-jni without any problem, but we need to include their
> license,
> > > > e.g., BSD license.*
> > > >
> > > > Both of BSD 2 Clause License & 3 Clause License requires to include
> the
> > > > license used, and BSD 3 Clause License requires that the name of the
> > > > contributor can't be used to endorse or promote the product. That's
> it
> > > > <
> > >
> >
> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> > > >
> > > > - They are not listed in the list of prohibited licenses
> > > >  also.
> > > >
> > > > Here is how Spark did for it
> > > > :
> > > >
> > > > - They made a directory dedicated to the dependency license files
> > > >  and added
> > > licenses
> > > > for Zstd
> > > > <
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
> > >
> > > &
> > > > Zstd-jni
> > > > <
> > >
> >
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> > > > .
> > > > - Added a link to the original license files in LICENSE.
> > > > 
> > > >
> > > > If needed, I can make a similar update.
> > > >
> > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > >
> > > >
> > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee 
> > wrote:
> > > >
> > > >> I greatly appreciate your comprehensive reasoning. so: +1 for b
> until
> > > now.
> > > >>
> > > >> For the license issues, I will have a check on how the over projects
> > are
> > > >> doing and share the results.
> > > >>
> > > >> Best,
> > > >> Dongjin
> > > >>
> > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > viktorsomo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Dongjin,
> > > >>>
> > > >>> A couple of comments:
> > > >>> I would vote for option b. in the "backward compatibility" section.
> > My
> > > >>> reasoning for this is that users upgrading to a zstd compatible
> > version
> > > >>> won't start to use it automatically, so manual reconfiguration is
> > > >>> required.
> > > >>> Therefore an upgrade won't mess up the cluster. If not all the
> > clients
> > > >>> are
> > > >>> upgraded but just some of them and they'd start to use zstd then it
> > > would
> > > >>> cause errors in the cluster. I'd like to presume though that this
> is
> > a
> > > >>> very
> > > >>> obvious failure case and nobody should be surprised if it didn't
> > work.
> > > >>> I wouldn't choose a. as I think we should bump the fetch and
> produce
> > > >>> requests if it's a change in the message format. Moreover if some
> of
> > > the
> > > >>> producers and the brokers are upgraded but some of the consumers
> are
> > > not,
> > > >>> then we wouldn't prevent the error when the old consumer tries to
> > > consume
> > > >>> the zstd compressed messages.
> > > >>> I wouldn't choose c. either as I think binding the compression type
> > to
> > > an
> > > >>> API is not so obvious from the developer's perspective.
> > > >>>
> > > >>> I would also prefer to use the existing binding, however we must
> > > respect
> > > >>> the licenses:
> > > >>> "The code for these JNI bindings is licenced under 2-clause BSD
> > > license.
> > > >>> The native Zstd library is licensed under 3-clause BSD license and
> > > GPL2"
> > > >>> Based on the FAQ page
> > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > >>> we may use 2- and 3-clause BSD licenses but the Apache license is
> not
> > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license and
> > > GPL2"
> > > >>> is re

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread John Roesler
Hi Guozhang,

That sounds good to me. I'll include that in the KIP.

Thanks,
-John

On Mon, Jul 9, 2018 at 6:33 PM Guozhang Wang  wrote:

> Let me clarify a bit on what I meant about moving `retentionPeriod` to
> WindowStoreBuilder:
>
> In another discussion we had around KIP-319 / 330, that the "retention
> period" should not really be a window spec, but only a window store spec,
> as it only affects how long to retain each window to be queryable along
> with the storage cost.
>
> More specifically, today the "maintainMs" returned from Windows is used in
> three places:
>
> 1) for windowed aggregations, they are passed in directly into
> `Stores.persistentWindows()` as the retention period parameters. For this
> use case we should just let the WindowStoreBuilder to specify this value
> itself.
>
> NOTE: It is also returned in the KStreamWindowAggregate processor, to
> determine if a received record should be dropped due to its lateness. We
> may need to think of another way to get this value inside the processor
>
> 2) for windowed stream-stream join, it is used as the join range parameter
> but only to check that "windowSizeMs <= retentionPeriodMs". We can do this
> check at the store builder lever instead of at the processor level.
>
>
> If we can remove its usage in both 1) and 2), then we should be able to
> safely remove this from the `Windows` spec.
>
>
> Guozhang
>
>
> On Mon, Jul 9, 2018 at 3:53 PM, John Roesler  wrote:
>
> > Thanks for the reply, Guozhang,
> >
> > Good! I agree, that is also a good reason, and I actually made use of
> that
> > in my tests. I'll update the KIP.
> >
> > By the way, I chose "allowedLateness" as I was trying to pick a better
> name
> > than "close", but I think it's actually the wrong name. We don't want to
> > bound the lateness of events in general, only with respect to the end of
> > their window.
> >
> > If we have a window [0,10), with "allowedLateness" of 5, then if we get
> an
> > event with timestamp 3 at time 9, the name implies we'd reject it, which
> > seems silly. Really, we'd only want to start rejecting that event at
> stream
> > time 15.
> >
> > What I meant was more like "allowedLatenessAfterWindowEnd", but that's
> too
> > verbose. I think that "close" + some documentation about what it means
> will
> > be better.
> >
> > 1: "Close" would be measured from the end of the window, so a reasonable
> > default would be "0". Recall that "close" really only needs to be
> specified
> > for final results, and a default of 0 would produce the most intuitive
> > results. If folks later discover that they are missing some late events,
> > they can adjust the parameter accordingly. IMHO, any other value would
> just
> > be a guess on our part.
> >
> > 2a:
> > I think you're saying to re-use "until" instead of adding "close" to the
> > window.
> >
> > The downside here would be that the semantic change could be more
> confusing
> > than deprecating "until" and introducing window "close" and a
> > "retentionTime" on the store builder. The deprecation is a good,
> controlled
> > way for us to make sure people are getting the semantics they think
> they're
> > getting, as well as giving us an opportunity to link people to the API
> they
> > should use instead.
> >
> > I didn't fully understand the second part, but it sounds like you're
> > suggesting to add a new "retentionTime" setter to Windows to bridge the
> gap
> > until we add it to the store builder? That seems kind of roundabout to
> me,
> > if that's what you meant. We could just immediately add it to the store
> > builders in the same PR.
> >
> > 2b: Sounds good to me!
> >
> > Thanks again,
> > -John
> >
> >
> > On Mon, Jul 9, 2018 at 4:55 PM Guozhang Wang  wrote:
> >
> > > John,
> > >
> > > Thanks for your replies. As for the two options of the API, I think I'm
> > > slightly inclined to the first option as well. My motivation is a bit
> > > different, as I think of the first one maybe more flexible, for
> example:
> > >
> > > KTable> table = ... count();
> > >
> > > table.toStream().peek(..);   // want to peek at the changelog stream,
> do
> > > not care about final results.
> > >
> > > table.suppress().toStream().to("topic");// sending to a topic, want
> > to
> > > only send the final results.
> > >
> > > --
> > >
> > > Besides that, I have a few more minor questions:
> > >
> > > 1. For "allowedLateness", what should be the default value? I.e. if
> user
> > do
> > > not specify "allowedLateness" in TimeWindows, what value should we set?
> > >
> > > 2. For API names, some personal suggestions here:
> > >
> > > 2.a) "allowedLateness"  -> "until" (semantics changed, and also value
> is
> > > defined as delta on top of window length), where "until" ->
> > > "retentionPeriod", and the latter will be removed from `Windows` to `
> > > WindowStoreBuilder` in the future.
> > >
> > > 2.b) "BufferConfig" -> "Buffered" ?
> > >
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Jul 9, 2018 at 2:09 P

[jira] [Resolved] (KAFKA-7080) WindowStoreBuilder incorrectly initializes CachingWindowStore

2018-07-10 Thread John Roesler (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-7080.
-
Resolution: Fixed

Fixed in https://github.com/apache/kafka/pull/5257

> WindowStoreBuilder incorrectly initializes CachingWindowStore
> -
>
> Key: KAFKA-7080
> URL: https://issues.apache.org/jira/browse/KAFKA-7080
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0, 1.0.1, 1.1.0, 2.0.0
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
> Fix For: 2.1.0
>
>
> When caching is enabled on the WindowStoreBuilder, it creates a 
> CachingWindowStore. However, it incorrectly passes storeSupplier.segments() 
> (the number of segments) to the segmentInterval argument.
>  
> The impact is low, since any valid number of segments is also a valid segment 
> size, but it likely results in much smaller segments than intended. For 
> example, the segments may be sized 3ms instead of 60,000ms.
>  
> Ideally the WindowBytesStoreSupplier interface would allow suppliers to 
> advertise their segment size instead of segment count. I plan to create a KIP 
> to propose this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-322: Return new error code for DeleteTopics API when topic deletion disabled.

2018-07-10 Thread Manikumar
Waiting for one more binding vote to pass this minor KIP.  Appreciate your
vote.

On Wed, Jul 4, 2018 at 7:03 PM Eno Thereska  wrote:

> +1 (non binding)
>
> On Wed, Jul 4, 2018 at 1:19 PM, Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding)
> >
> > On Wed, Jul 4, 2018 at 5:22 PM Magnus Edenhill 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > 2018-07-04 13:40 GMT+02:00 Satish Duggana :
> > >
> > > > +1
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Wed, Jul 4, 2018 at 4:11 PM, Daniele Ascione  >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thanks,
> > > > > Daniele
> > > > >
> > > > > Il giorno mar 3 lug 2018 alle ore 23:55 Harsha 
> ha
> > > > > scritto:
> > > > >
> > > > > > +1.
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Tue, Jul 3rd, 2018 at 9:22 AM, Ted Yu 
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Tue, Jul 3, 2018 at 9:05 AM, Mickael Maison <
> > > > > > mickael.mai...@gmail.com >
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non binding)
> > > > > > > > Thanks for the KIP
> > > > > > > >
> > > > > > > > On Tue, Jul 3, 2018 at 4:59 PM, Vahid S Hashemian
> > > > > > > > < vahidhashem...@us.ibm.com > wrote:
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > --Vahid
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > From: Gwen Shapira < g...@confluent.io >
> > > > > > > > > To: dev < dev@kafka.apache.org >
> > > > > > > > > Date: 07/03/2018 08:49 AM
> > > > > > > > > Subject: Re: [VOTE] KIP-322: Return new error code for
> > > > > > > > DeleteTopics
> > > > > > > > > API when topic deletion disabled.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > On Tue, Jul 3, 2018 at 8:24 AM, Manikumar <
> > > > > > manikumar.re...@gmail.com >
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Manikumar < manikumar.re...@gmail.com >
> > > > > > > > >> Fri, Jun 29, 7:59 PM (4 days ago)
> > > > > > > > >> to dev
> > > > > > > > >> Hi All,
> > > > > > > > >>
> > > > > > > > >> I would like to start voting on KIP-322 which would return
> > new
> > > > > error
> > > > > > > > > code
> > > > > > > > >> for DeleteTopics API when topic deletion disabled.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > > > > > action?pageId=87295558
> > > > > > > > >
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > *Gwen Shapira*
> > > > > > > > > Product Manager | Confluent
> > > > > > > > > 650.450.2760 | @gwenshap
> > > > > > > > > Follow us: Twitter <
> > > > > > > > > https://twitter.com/ConfluentInc
> > > > > > > > >> | blog
> > > > > > > > > <
> > > > > > > > > http://www.confluent.io/blog
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>


[VOTE] 2.0.0 RC2

2018-07-10 Thread Rajini Sivaram
Hello Kafka users, developers and client-developers,


This is the third candidate for release of Apache Kafka 2.0.0.


This is a major version release of Apache Kafka. It includes 40 new  KIPs
and

several critical bug fixes. Please see the 2.0.0 release plan for more
details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820


A few notable highlights:

   - Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for CreateTopics
   (KIP-277)
   - SASL/OAUTHBEARER implementation (KIP-255)
   - Improved quota communication and customization of quotas (KIP-219,
   KIP-257)
   - Efficient memory usage for down conversion (KIP-283)
   - Fix log divergence between leader and follower during fast leader
   failover (KIP-279)
   - Drop support for Java 7 and remove deprecated code including old scala
   clients
   - Connect REST extension plugin, support for externalizing secrets and
   improved error handling (KIP-285, KIP-297, KIP-298 etc.)
   - Scala API for Kafka Streams and other Streams API improvements
   (KIP-270, KIP-150, KIP-245, KIP-251 etc.)


Release notes for the 2.0.0 release:

http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/RELEASE_NOTES.html


*** Please download, test and vote by Friday, July 13, 4pm PT


Kafka's KEYS file containing PGP keys we use to sign the release:

http://kafka.apache.org/KEYS


* Release artifacts to be voted upon (source and binary):

http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/


* Maven artifacts to be voted upon:

https://repository.apache.org/content/groups/staging/


* Javadoc:

http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/javadoc/


* Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:

https://github.com/apache/kafka/tree/2.0.0-rc2



* Documentation:

http://kafka.apache.org/20/documentation.html


* Protocol:

http://kafka.apache.org/20/protocol.html


* Successful Jenkins builds for the 2.0 branch:

Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/72/

System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.0/27/


/**


Thanks,


Rajini


Re: Stream config in StreamPartitionAssignor

2018-07-10 Thread Guozhang Wang
Boyang,

I see. That makes sense. We should at least update the docs to reduce
confusions.


Guozhang


On Mon, Jul 9, 2018 at 10:22 PM, Boyang Chen  wrote:

> Hey Guozhang,
>
>
> thanks for the reply. Actually I'm not trying to pass config into
> StreamPartitionAssignor. My debugging was about some other configs that
> wasn't transferred properly (completely irrelevant), but the effort was
> delayed because the log would print "StreamsConfig values" for
> StreamPartitionAssignor, where it will fallback all the config to defaults.
> So it would give me as an end user the impression that my config passed in
> was "not successfully" conveyed to each Stream Thread.
>
>
> I'm thinking whether we could disable stream partitioner to print out its
> config in the log, or have a more explicit way saying this is not Stream
> Thread config? Hope I made myself clear to you.
>
>
> Also yes, I could put some comment in the docs to let user use Consumer
> config prefix if that could help.
>
>
> Best,
>
> Boyang
>
> 
> From: Guozhang Wang 
> Sent: Tuesday, July 10, 2018 5:35 AM
> To: dev
> Subject: Re: Stream config in StreamPartitionAssignor
>
> Hello Boyang,
>
> Thanks for brining this up. Currently since the StreamsPartitionAssingor
> can only be initiated within the Consumer instance, I think letting users
> to pass in the config values with prefix is the preferrable way, i.e. we
> can improve our docs to educate users about that. BTW I'm curious what
> configs you want to pass into the StreamsPartitionAssignor? Currently since
> there is no user configurable part inside that class, I do not know what
> configs can take effects.
>
>
> Guozhang
>
> On Mon, Jul 9, 2018 at 2:06 PM, Boyang Chen  wrote:
>
> > Hey there,
> >
> >
> > over the weekend I was debugging the streams configuration not passed
> > within threads. I noticed that one of the code path from KafkaConsumer
> > (L743) was to initialize the StreamPartitionAssignor:
> >
> > this.assignors = config.getConfiguredInstances(
> > ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
> > PartitionAssignor.class);
> >
> > However, it was using the ConsumerConfig instance (that config is passed
> > in), so if I want to make some configuration change in the assignor, I
> need
> > to put consumer prefix. To make the debugging even harder, there was an
> > logAll() function in AbstractConfig which will print "StreamsConfig
> values"
> > at the beginning, since it is indeed a stream config:
> >
> > @Override
> > public void configure(final Map configs) {
> > final StreamsConfig streamsConfig = new StreamsConfig(configs);
> >
> > (L190 in StreamPartitionAssignor)
> >
> >
> > This would further confuse developer as they see two different sets of
> > StreamsConfig: one from top level, one from this derived level per
> thread.
> >
> >
> > My point is that we could either: 1. let developer be aware that they
> need
> > to add consumer prefix to pass in configs to StreamPartitionAssignor 2.
> we
> > found a way to pass in original StreamsConfig.
> >
> > I know this is a little bit lengthy description, let me know if you feel
> > unclear about my proposal, or this is not a concern since most people
> > already know the trick here, thank you!
> >
> >
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang


Re: Kafka system tests contribution

2018-07-10 Thread Colin McCabe
Hi Andriy,

Try looking at the logs to see why the test failed.

best,
Colin

On Wed, May 16, 2018, at 07:08, Andriy Sorokhtey wrote:
> Hi,
> 
> Did anyone had a chance to take a look at this issue?
> 
> 2018-05-08 15:01 GMT+03:00 Andriy Sorokhtey :
> 
> > Hello Kafka team
> >
> > I’d like to contribute to the Kafka system tests.
> >
> > I’ve tried to execute system tests locally and I have some issues. Can
> > anyone give me a hand to figure out what’s wrong?
> >
> > So, I see that multiple system tests are failing when I try to run it with
> > the docker or with vagrant.
> > Maybe there is some way to debug it using PyCharm. For example, put some
> > breakpoint and start debugging, when the test goes to the breakpoint I’d
> > like to go to instances and check what’s going on there.
> > I’ll be thankful for any advice.
> >
> >  Here is an example of one test failure:
> >
> >> [INFO:2018-05-03 06:37:19,861]: Triggering test 1 of 37...
> >> [INFO:2018-05-03 06:37:19,870]: RunnerClient: Loading test {'directory':
> >> '/opt/kafka-dev/tests/kafkatest/tests/streams', 'file_name':
> >> 'streams_broker_compatibility_test.py', 'method_name':
> >> 'test_compatible_brokers_eos_disabled', 'cls_name':
> >> 'StreamsBrokerCompatibility', 'injected_args': {'broker_version':
> >> '0.10.1.1'}}
> >> [INFO:2018-05-03 06:37:19,874]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: Setting
> >> up...
> >> [INFO:2018-05-03 06:37:22,484]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: Running...
> >> [INFO:2018-05-03 06:38:34,129]: RunnerClient: kafkatest.tests.streams.
> >> streams_broker_compatibility_test.StreamsBrokerCompatibility.
> >> test_compatible_brokers_eos_disabled.broker_version=0.10.1.1: FAIL:
> >> Never saw message indicating StreamsTest finished startup on 
> >> ducker@ducker05
> >> Traceback (most recent call last):
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py",
> >> line 132, in run
> >> data = self.run_test()
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py",
> >> line 185, in run_test
> >> return self.test_context.function(self.test)
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py",
> >> line 324, in wrapper
> >> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> >> File "/opt/kafka-dev/tests/kafkatest/tests/streams/
> >> streams_broker_compatibility_test.py", line 84, in
> >> test_compatible_brokers_eos_disabled
> >> processor.start()
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py",
> >> line 234, in start
> >> self.start_node(node)
> >> File "/opt/kafka-dev/tests/kafkatest/services/streams.py", line 138, in
> >> start_node
> >> monitor.wait_until('StreamsTest instance started', timeout_sec=60,
> >> err_msg="Never saw message indicating StreamsTest finished startup on " +
> >> str(node.account))
> >> File 
> >> "/usr/local/lib/python2.7/dist-packages/ducktape/cluster/remoteaccount.py",
> >> line 668, in wait_until
> >> allow_fail=True) == 0, **kwargs)
> >> File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py",
> >> line 36, in wait_until
> >> raise TimeoutError(err_msg)
> >> TimeoutError: Never saw message indicating StreamsTest finished startup
> >> on ducker@ducker05
> >
> >
> > If I figure out what's wrong I can try to fix other tests.
> >
> > --
> >
> > *Sincerely*
> > *Andriy Sorokhtey*
> > +380681890146
> >
> 
> 
> 
> -- 
> 
> *Sincerely*
> *Andriy Sorokhtey*
> +380681890146


[jira] [Created] (KAFKA-7146) Grouping consumer requests per consumer coordinator in admin client in describeConsumerGroups

2018-07-10 Thread Yishun Guan (JIRA)
Yishun Guan created KAFKA-7146:
--

 Summary: Grouping consumer requests per consumer coordinator in 
admin client in describeConsumerGroups
 Key: KAFKA-7146
 URL: https://issues.apache.org/jira/browse/KAFKA-7146
 Project: Kafka
  Issue Type: Sub-task
Reporter: Yishun Guan


Subtask of KAFKA-6788. Group consumer requests for describeConsumerGroups().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-264: Add a consumer metric to record raw fetch size

2018-07-10 Thread Vahid S Hashemian
Bump!




From:   "Vahid S Hashemian" 
To: dev 
Date:   05/25/2018 10:51 AM
Subject:[VOTE] KIP-264: Add a consumer metric to record raw fetch 
size



In the absence of additional feedback on this KIP I'd like to start a 
vote.

To summarize, the KIP simply proposes to add a consumer metric to track 
the size of raw (uncompressed) fetched messages.
The KIP can be found here: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-264%3A+Add+a+consumer+metric+to+record+raw+fetch+size


Thanks.
--Vahid








Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-10 Thread Dong Lin
Hey Anna,

Thanks much for your detailed explanation and example! It does help me
understand the difference between our understanding.

So it seems that the solution based on findOffsets() currently focuses
mainly on the scenario that consumer has cached leaderEpoch -> offset
mapping whereas I was thinking about the general case where consumer may or
may not have this cache. I guess that is why we have different
understanding here. I have some comments below.


3) The proposed solution using findOffsets(offset, leaderEpoch) followed by
seek(offset) works if consumer has the cached leaderEpoch -> offset
mapping. But if we assume consumer has this cache, do we need to have
leaderEpoch in the findOffsets(...)? Intuitively, the findOffsets(offset)
can also derive the leaderEpoch using offset just like the proposed
solution does with seek(offset).


4) If consumer does not have cached leaderEpoch -> offset mapping, which is
the case if consumer is restarted on a new machine, then it is not clear
what leaderEpoch would be included in the FetchRequest if consumer does
seek(offset). This is the case that motivates the first question of the
previous email. In general, maybe we should discuss the final solution that
covers all cases?


5) The second question in my previous email is related to the following
paragraph:

"... In some cases, offsets returned from position() could be actual
consumed messages by this consumer identified by {offset, leader epoch}. In
other cases, position() returns offset that was not actually consumed.
Suppose, the user calls position() for the last offset...".

I guess my point is that, if user calls position() for the last offset and
uses that offset in seek(...), then user can probably just call
Consumer#seekToEnd() without calling position() and seek(...). Similarly
user can call Consumer#seekToBeginning() to the seek to the earliest
position without calling position() and seek(...). Thus position() only
needs to return the actual consumed messages identified by {offset, leader
epoch}. Does this make sense?


Thanks,
Dong


On Mon, Jul 9, 2018 at 6:47 PM, Anna Povzner  wrote:

> Hi Dong,
>
>
> Thanks for considering my suggestions.
>
>
> Based on your comments, I realized that my suggestion was not complete with
> regard to KafkaConsumer API vs. consumer-broker protocol. While I propose
> to keep KafkaConsumer#seek() unchanged and take offset only, the underlying
> consumer will send the next FetchRequest() to broker with offset and
> leaderEpoch if it is known (based on leader epoch cache in consumer) — note
> that this is different from the current KIP, which suggests to always send
> unknown leader epoch after seek(). This way, if the consumer and a broker
> agreed on the point of non-divergence, which is some {offset, leaderEpoch}
> pair, the new leader which causes another truncation (even further back)
> will be able to detect new divergence and restart the process of finding
> the new point of non-divergence. So, to answer your question, If the
> truncation happens just after the user calls
> KafkaConsumer#findOffsets(offset, leaderEpoch) followed by seek(offset),
> the user will not seek to the wrong position without knowing that
> truncation has happened, because the consumer will get another truncation
> error, and seek again.
>
>
> I am afraid, I did not understand your second question. Let me summarize my
> suggestions again, and then give an example to hopefully make my
> suggestions more clear. Also, the last part of my example shows how the
> use-case in your first question will work. If it does not answer your
> second question, would you mind clarifying? I am also focusing on the case
> of a consumer having enough entries in the cache. The case of restarting
> from committed offset either stored externally or internally will probably
> need to be discussed more.
>
>
> Let me summarize my suggestion again:
>
> 1) KafkaConsumer#seek() and KafkaConsumer#position() remains unchanged
>
> 2) New KafkaConsumer#findOffsets() takes {offset, leaderEpoch} pair per
> topic partition and returns offset per topic partition.
>
> 3) FetchRequest() to broker after KafkaConsumer#seek() will contain the
> offset set by seek and leaderEpoch that corresponds to the offset based on
> leader epoch cache in the consumer.
>
>
> The rest of this e-mail is a long and contrived example with several log
> truncations and unclean leader elections to illustrate the API and your
> first use-case. Suppose we have three brokers. Initially, Broker A, B, and
> C has one message at offset 0 with leader epoch 0. Then, Broker A goes down
> for some time. Broker B becomes a leader with epoch 1, and writes messages
> to offsets 1 and 2. Broker C fetches offset 1, but before fetching offset
> 2, becomes a leader with leader epoch 2 and writes a message at offset 2.
> Here is the state of brokers at this point:
>
> > Broker A:
> > offset 0, epoch 0 <— leader
> > goes down…
>
>
> > Broker B:
> > offset 0, epoch

Stream processing meetup at LinkedIn (Sunnyvale) on Thursday, July 19 at 6pm

2018-07-10 Thread Dong Lin
Hi everyone,

We would like to invite you to a Stream Processing Meetup at
LinkedIn’s Sunnyvale
campus, LinkedIn Building F (LSNF) 605 West Maude Avenuee, on Thursday,
July 19 at 6pm.

Please RSVP here (if you intend to attend in person):
*https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/
*

We have the following three talks scheduled:

- Beam me up Samza: How we built a Samza Runner for Apache Beam (Xinyu Liu,
LinkedIn)

- uReplicator: Uber Engineering’s Scalable Robust Kafka Replicator
(Hongliang Xu, Uber)

- Concourse - Near real time notifications platform at Linkedin (Ajith
Muralidharan & Vivek Nelamangala, LinkedIn)

Food (pizza, wings) & drink (water, beer, wine) will be provided.

Live streaming will be provided at
https://primetime.bluejeans.com/a2m/live-event/pawhxrsd


Thanks,
Dong


[jira] [Created] (KAFKA-7147) ReassignPartitionsCommand should be able to connect to broker over SSL

2018-07-10 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7147:
---

 Summary: ReassignPartitionsCommand should be able to connect to 
broker over SSL
 Key: KAFKA-7147
 URL: https://issues.apache.org/jira/browse/KAFKA-7147
 Project: Kafka
  Issue Type: Improvement
Reporter: Dong Lin
Assignee: Dong Lin


Currently both ReassignPartitionsCommand and LogDirsCommand instantiates 
AdminClient using bootstrap.servers and client.id provided by the user. Since 
it does not provide other ssl-related properties, these tools will not be able 
to talk to broker over SSL.

 

In order to solve this problem, these tools should allow users to provide 
property file containing configs to be passed to AdminClient.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-10 Thread Anna Povzner
Hi Dong,

Thanks for the follow up! I finally have much more clear understanding of
where you are coming from.

You are right. The success of findOffsets()/finding a point of
non-divergence depends on whether we have enough entries in the consumer's
leader epoch cache. However, I think this is a fundamental limitation of
having a leader epoch cache that does not persist across consumer restarts.

If we consider the general case where consumer may or may not have this
cache, then I see two paths:
1) Letting the user to track the leader epoch history externally, and have
more exposure to leader epoch and finding point of non-divergence in
KafkaConsumer API. I understand this is the case you were talking about.



On Tue, Jul 10, 2018 at 12:16 PM Dong Lin  wrote:

> Hey Anna,
>
> Thanks much for your detailed explanation and example! It does help me
> understand the difference between our understanding.
>
> So it seems that the solution based on findOffsets() currently focuses
> mainly on the scenario that consumer has cached leaderEpoch -> offset
> mapping whereas I was thinking about the general case where consumer may or
> may not have this cache. I guess that is why we have different
> understanding here. I have some comments below.
>
>
> 3) The proposed solution using findOffsets(offset, leaderEpoch) followed by
> seek(offset) works if consumer has the cached leaderEpoch -> offset
> mapping. But if we assume consumer has this cache, do we need to have
> leaderEpoch in the findOffsets(...)? Intuitively, the findOffsets(offset)
> can also derive the leaderEpoch using offset just like the proposed
> solution does with seek(offset).
>
>
> 4) If consumer does not have cached leaderEpoch -> offset mapping, which is
> the case if consumer is restarted on a new machine, then it is not clear
> what leaderEpoch would be included in the FetchRequest if consumer does
> seek(offset). This is the case that motivates the first question of the
> previous email. In general, maybe we should discuss the final solution that
> covers all cases?
>
>
> 5) The second question in my previous email is related to the following
> paragraph:
>
> "... In some cases, offsets returned from position() could be actual
> consumed messages by this consumer identified by {offset, leader epoch}. In
> other cases, position() returns offset that was not actually consumed.
> Suppose, the user calls position() for the last offset...".
>
> I guess my point is that, if user calls position() for the last offset and
> uses that offset in seek(...), then user can probably just call
> Consumer#seekToEnd() without calling position() and seek(...). Similarly
> user can call Consumer#seekToBeginning() to the seek to the earliest
> position without calling position() and seek(...). Thus position() only
> needs to return the actual consumed messages identified by {offset, leader
> epoch}. Does this make sense?
>
>
> Thanks,
> Dong
>
>
> On Mon, Jul 9, 2018 at 6:47 PM, Anna Povzner  wrote:
>
> > Hi Dong,
> >
> >
> > Thanks for considering my suggestions.
> >
> >
> > Based on your comments, I realized that my suggestion was not complete
> with
> > regard to KafkaConsumer API vs. consumer-broker protocol. While I propose
> > to keep KafkaConsumer#seek() unchanged and take offset only, the
> underlying
> > consumer will send the next FetchRequest() to broker with offset and
> > leaderEpoch if it is known (based on leader epoch cache in consumer) —
> note
> > that this is different from the current KIP, which suggests to always
> send
> > unknown leader epoch after seek(). This way, if the consumer and a broker
> > agreed on the point of non-divergence, which is some {offset,
> leaderEpoch}
> > pair, the new leader which causes another truncation (even further back)
> > will be able to detect new divergence and restart the process of finding
> > the new point of non-divergence. So, to answer your question, If the
> > truncation happens just after the user calls
> > KafkaConsumer#findOffsets(offset, leaderEpoch) followed by seek(offset),
> > the user will not seek to the wrong position without knowing that
> > truncation has happened, because the consumer will get another truncation
> > error, and seek again.
> >
> >
> > I am afraid, I did not understand your second question. Let me summarize
> my
> > suggestions again, and then give an example to hopefully make my
> > suggestions more clear. Also, the last part of my example shows how the
> > use-case in your first question will work. If it does not answer your
> > second question, would you mind clarifying? I am also focusing on the
> case
> > of a consumer having enough entries in the cache. The case of restarting
> > from committed offset either stored externally or internally will
> probably
> > need to be discussed more.
> >
> >
> > Let me summarize my suggestion again:
> >
> > 1) KafkaConsumer#seek() and KafkaConsumer#position() remains unchanged
> >
> > 2) New KafkaConsumer#findOffsets() takes {offset, l

Re: Stream config in StreamPartitionAssignor

2018-07-10 Thread Boyang Chen
Sounds good. I will take a look at the doc and get back to you!



From: Guozhang Wang 
Sent: Wednesday, July 11, 2018 2:07 AM
To: dev
Subject: Re: Stream config in StreamPartitionAssignor

Boyang,

I see. That makes sense. We should at least update the docs to reduce
confusions.


Guozhang


On Mon, Jul 9, 2018 at 10:22 PM, Boyang Chen  wrote:

> Hey Guozhang,
>
>
> thanks for the reply. Actually I'm not trying to pass config into
> StreamPartitionAssignor. My debugging was about some other configs that
> wasn't transferred properly (completely irrelevant), but the effort was
> delayed because the log would print "StreamsConfig values" for
> StreamPartitionAssignor, where it will fallback all the config to defaults.
> So it would give me as an end user the impression that my config passed in
> was "not successfully" conveyed to each Stream Thread.
>
>
> I'm thinking whether we could disable stream partitioner to print out its
> config in the log, or have a more explicit way saying this is not Stream
> Thread config? Hope I made myself clear to you.
>
>
> Also yes, I could put some comment in the docs to let user use Consumer
> config prefix if that could help.
>
>
> Best,
>
> Boyang
>
> 
> From: Guozhang Wang 
> Sent: Tuesday, July 10, 2018 5:35 AM
> To: dev
> Subject: Re: Stream config in StreamPartitionAssignor
>
> Hello Boyang,
>
> Thanks for brining this up. Currently since the StreamsPartitionAssingor
> can only be initiated within the Consumer instance, I think letting users
> to pass in the config values with prefix is the preferrable way, i.e. we
> can improve our docs to educate users about that. BTW I'm curious what
> configs you want to pass into the StreamsPartitionAssignor? Currently since
> there is no user configurable part inside that class, I do not know what
> configs can take effects.
>
>
> Guozhang
>
> On Mon, Jul 9, 2018 at 2:06 PM, Boyang Chen  wrote:
>
> > Hey there,
> >
> >
> > over the weekend I was debugging the streams configuration not passed
> > within threads. I noticed that one of the code path from KafkaConsumer
> > (L743) was to initialize the StreamPartitionAssignor:
> >
> > this.assignors = config.getConfiguredInstances(
> > ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
> > PartitionAssignor.class);
> >
> > However, it was using the ConsumerConfig instance (that config is passed
> > in), so if I want to make some configuration change in the assignor, I
> need
> > to put consumer prefix. To make the debugging even harder, there was an
> > logAll() function in AbstractConfig which will print "StreamsConfig
> values"
> > at the beginning, since it is indeed a stream config:
> >
> > @Override
> > public void configure(final Map configs) {
> > final StreamsConfig streamsConfig = new StreamsConfig(configs);
> >
> > (L190 in StreamPartitionAssignor)
> >
> >
> > This would further confuse developer as they see two different sets of
> > StreamsConfig: one from top level, one from this derived level per
> thread.
> >
> >
> > My point is that we could either: 1. let developer be aware that they
> need
> > to add consumer prefix to pass in configs to StreamPartitionAssignor 2.
> we
> > found a way to pass in original StreamsConfig.
> >
> > I know this is a little bit lengthy description, let me know if you feel
> > unclear about my proposal, or this is not a concern since most people
> > already know the trick here, thank you!
> >
> >
>
>
> --
> -- Guozhang
>



--
-- Guozhang


Re: [VOTE] 2.0.0 RC2

2018-07-10 Thread Ted Yu
+1

Ran thru test suite.

Checked signatures.

On Tue, Jul 10, 2018 at 10:17 AM Rajini Sivaram 
wrote:

> Hello Kafka users, developers and client-developers,
>
>
> This is the third candidate for release of Apache Kafka 2.0.0.
>
>
> This is a major version release of Apache Kafka. It includes 40 new  KIPs
> and
>
> several critical bug fixes. Please see the 2.0.0 release plan for more
> details:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820
>
>
> A few notable highlights:
>
>- Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for CreateTopics
>(KIP-277)
>- SASL/OAUTHBEARER implementation (KIP-255)
>- Improved quota communication and customization of quotas (KIP-219,
>KIP-257)
>- Efficient memory usage for down conversion (KIP-283)
>- Fix log divergence between leader and follower during fast leader
>failover (KIP-279)
>- Drop support for Java 7 and remove deprecated code including old scala
>clients
>- Connect REST extension plugin, support for externalizing secrets and
>improved error handling (KIP-285, KIP-297, KIP-298 etc.)
>- Scala API for Kafka Streams and other Streams API improvements
>(KIP-270, KIP-150, KIP-245, KIP-251 etc.)
>
>
> Release notes for the 2.0.0 release:
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/RELEASE_NOTES.html
>
>
> *** Please download, test and vote by Friday, July 13, 4pm PT
>
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
>
> http://kafka.apache.org/KEYS
>
>
> * Release artifacts to be voted upon (source and binary):
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/
>
>
> * Maven artifacts to be voted upon:
>
> https://repository.apache.org/content/groups/staging/
>
>
> * Javadoc:
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/javadoc/
>
>
> * Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:
>
> https://github.com/apache/kafka/tree/2.0.0-rc2
>
>
>
> * Documentation:
>
> http://kafka.apache.org/20/documentation.html
>
>
> * Protocol:
>
> http://kafka.apache.org/20/protocol.html
>
>
> * Successful Jenkins builds for the 2.0 branch:
>
> Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/72/
>
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/2.0/27/
>
>
> /**
>
>
> Thanks,
>
>
> Rajini
>


[jira] [Resolved] (KAFKA-7141) kafka-consumer-group doesn't describe existing group

2018-07-10 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian resolved KAFKA-7141.

Resolution: Not A Problem

> kafka-consumer-group doesn't describe existing group
> 
>
> Key: KAFKA-7141
> URL: https://issues.apache.org/jira/browse/KAFKA-7141
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.11.0.0, 1.0.1
>Reporter: Bohdana Panchenko
>Priority: Major
>
> I am running two consumers: akka-stream-kafka consumer with standard config 
> section as described in the 
> [https://doc.akka.io/docs/akka-stream-kafka/current/consumer.html] and  
> kafka-console-consumer.
> akka-stream-kafka consumer configuration looks like this
> {color:#33}_akka.kafka.consumer{_{color}
> {color:#33}  _kafka-clients{_{color}
> {color:#33}    _group.id = "myakkastreamkafka-1"_{color}
> {color:#33}   _enable.auto.commit = false_{color}
> }
> {color:#33} }{color}
>  
>  I am able to see the both groups with the command
>  
>  *kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --list*
>  _Note: This will not show information about old Zookeeper-based consumers._
>  
>  _myakkastreamkafka-1_
>  _console-consumer-57171_
> {color:#33}I am able to view details about the console consumer 
> group{color}
> *kafka-consumer-groups --describe --bootstrap-server 127.0.0.1:9092 --group 
> console-consumer-57171*
>  _{color:#205081}Note: This will not show information about old 
> Zookeeper-based consumers.{color}_
> _{color:#205081}TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID 
> HOST CLIENT-ID{color}_
>  _{color:#205081}STREAM-TEST 0 0 0 0 
> consumer-1-6b928e07-196a-4322-9928-068681617878 /172.19.0.4 consumer-1{color}_
> {color:#33}But the command to describe my akka stream consumer gives me 
> empty output:{color}
> *kafka-consumer-groups --describe --bootstrap-server 127.0.0.1:9092 --group 
> myakkastreamkafka-1*
>  {color:#205081}_Note: This will not show information about old 
> Zookeeper-based consumers._{color}
> {color:#205081}_TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID 
> HOST CLIENT-ID_{color}
>  
> {color:#33}That is strange. Can you please check the issue?{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-10 Thread Anna Povzner
Sorry, I hit "send" before finishing. Continuing...


2) Hiding most of the consumer handling log truncation logic with minimal
exposure in KafkaConsumer API.  I was proposing this path.


Before answering your specific questions… I want to answer to your comment
“In general, maybe we should discuss the final solution that covers all
cases?”. With current KIP, we don’t cover all cases of consumer detecting
log truncation because the KIP proposes a leader epoch cache in consumer
that does not persist across restarts. Plus, we only store last committed
offset (either internally or users can store externally). This has a
limitation that the consumer will not always be able to find point of
truncation just because we have a limited history (just one data point).


So, maybe we should first agree on whether we accept that storing last
committed offset/leader epoch has a limitation that the consumer will not
be able to detect log truncation in all cases?


Thanks,

Anna

On Tue, Jul 10, 2018 at 2:20 PM Anna Povzner  wrote:

> Hi Dong,
>
> Thanks for the follow up! I finally have much more clear understanding of
> where you are coming from.
>
> You are right. The success of findOffsets()/finding a point of
> non-divergence depends on whether we have enough entries in the consumer's
> leader epoch cache. However, I think this is a fundamental limitation of
> having a leader epoch cache that does not persist across consumer restarts.
>
> If we consider the general case where consumer may or may not have this
> cache, then I see two paths:
> 1) Letting the user to track the leader epoch history externally, and have
> more exposure to leader epoch and finding point of non-divergence in
> KafkaConsumer API. I understand this is the case you were talking about.
>
>
>
> On Tue, Jul 10, 2018 at 12:16 PM Dong Lin  wrote:
>
>> Hey Anna,
>>
>> Thanks much for your detailed explanation and example! It does help me
>> understand the difference between our understanding.
>>
>> So it seems that the solution based on findOffsets() currently focuses
>> mainly on the scenario that consumer has cached leaderEpoch -> offset
>> mapping whereas I was thinking about the general case where consumer may
>> or
>> may not have this cache. I guess that is why we have different
>> understanding here. I have some comments below.
>>
>>
>> 3) The proposed solution using findOffsets(offset, leaderEpoch) followed
>> by
>> seek(offset) works if consumer has the cached leaderEpoch -> offset
>> mapping. But if we assume consumer has this cache, do we need to have
>> leaderEpoch in the findOffsets(...)? Intuitively, the findOffsets(offset)
>> can also derive the leaderEpoch using offset just like the proposed
>> solution does with seek(offset).
>>
>>
>> 4) If consumer does not have cached leaderEpoch -> offset mapping, which
>> is
>> the case if consumer is restarted on a new machine, then it is not clear
>> what leaderEpoch would be included in the FetchRequest if consumer does
>> seek(offset). This is the case that motivates the first question of the
>> previous email. In general, maybe we should discuss the final solution
>> that
>> covers all cases?
>>
>>
>> 5) The second question in my previous email is related to the following
>> paragraph:
>>
>> "... In some cases, offsets returned from position() could be actual
>> consumed messages by this consumer identified by {offset, leader epoch}.
>> In
>> other cases, position() returns offset that was not actually consumed.
>> Suppose, the user calls position() for the last offset...".
>>
>> I guess my point is that, if user calls position() for the last offset and
>> uses that offset in seek(...), then user can probably just call
>> Consumer#seekToEnd() without calling position() and seek(...). Similarly
>> user can call Consumer#seekToBeginning() to the seek to the earliest
>> position without calling position() and seek(...). Thus position() only
>> needs to return the actual consumed messages identified by {offset, leader
>> epoch}. Does this make sense?
>>
>>
>> Thanks,
>> Dong
>>
>>
>> On Mon, Jul 9, 2018 at 6:47 PM, Anna Povzner  wrote:
>>
>> > Hi Dong,
>> >
>> >
>> > Thanks for considering my suggestions.
>> >
>> >
>> > Based on your comments, I realized that my suggestion was not complete
>> with
>> > regard to KafkaConsumer API vs. consumer-broker protocol. While I
>> propose
>> > to keep KafkaConsumer#seek() unchanged and take offset only, the
>> underlying
>> > consumer will send the next FetchRequest() to broker with offset and
>> > leaderEpoch if it is known (based on leader epoch cache in consumer) —
>> note
>> > that this is different from the current KIP, which suggests to always
>> send
>> > unknown leader epoch after seek(). This way, if the consumer and a
>> broker
>> > agreed on the point of non-divergence, which is some {offset,
>> leaderEpoch}
>> > pair, the new leader which causes another truncation (even further back)
>> > will be able to detect new divergence and restart the 

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread John Roesler
I had some opportunity to reflect on the default for close time today...

Note that the current "close time" is equal to the retention time, and
therefore "close" today shares the default retention of 24h.

It would definitely break any application that today specifies a retention
time to set close shorter than that time. It's also likely to break apps if
they *don't* set the retention time and rely on the 24h default. So it's
unfortunate, but I think if "close" isn't set, we should use the retention
time instead of a fixed default.

When we ultimately remove the retention time parameter ("until"), we will
have to set "close" to a default of 24h.

Of course, this has a negative impact on the user of "final results", since
they won't see any output at all for retentionTime/24h, and may find this
confusing. What can we do about this except document it well? Maybe log a
warning if we see that close wasn't explicitly set while using "final
results"?

Thanks,
-John

On Tue, Jul 10, 2018 at 10:46 AM John Roesler  wrote:

> Hi Guozhang,
>
> That sounds good to me. I'll include that in the KIP.
>
> Thanks,
> -John
>
> On Mon, Jul 9, 2018 at 6:33 PM Guozhang Wang  wrote:
>
>> Let me clarify a bit on what I meant about moving `retentionPeriod` to
>> WindowStoreBuilder:
>>
>> In another discussion we had around KIP-319 / 330, that the "retention
>> period" should not really be a window spec, but only a window store spec,
>> as it only affects how long to retain each window to be queryable along
>> with the storage cost.
>>
>> More specifically, today the "maintainMs" returned from Windows is used in
>> three places:
>>
>> 1) for windowed aggregations, they are passed in directly into
>> `Stores.persistentWindows()` as the retention period parameters. For this
>> use case we should just let the WindowStoreBuilder to specify this value
>> itself.
>>
>> NOTE: It is also returned in the KStreamWindowAggregate processor, to
>> determine if a received record should be dropped due to its lateness. We
>> may need to think of another way to get this value inside the processor
>>
>> 2) for windowed stream-stream join, it is used as the join range parameter
>> but only to check that "windowSizeMs <= retentionPeriodMs". We can do this
>> check at the store builder lever instead of at the processor level.
>>
>>
>> If we can remove its usage in both 1) and 2), then we should be able to
>> safely remove this from the `Windows` spec.
>>
>>
>> Guozhang
>>
>>
>> On Mon, Jul 9, 2018 at 3:53 PM, John Roesler  wrote:
>>
>> > Thanks for the reply, Guozhang,
>> >
>> > Good! I agree, that is also a good reason, and I actually made use of
>> that
>> > in my tests. I'll update the KIP.
>> >
>> > By the way, I chose "allowedLateness" as I was trying to pick a better
>> name
>> > than "close", but I think it's actually the wrong name. We don't want to
>> > bound the lateness of events in general, only with respect to the end of
>> > their window.
>> >
>> > If we have a window [0,10), with "allowedLateness" of 5, then if we get
>> an
>> > event with timestamp 3 at time 9, the name implies we'd reject it, which
>> > seems silly. Really, we'd only want to start rejecting that event at
>> stream
>> > time 15.
>> >
>> > What I meant was more like "allowedLatenessAfterWindowEnd", but that's
>> too
>> > verbose. I think that "close" + some documentation about what it means
>> will
>> > be better.
>> >
>> > 1: "Close" would be measured from the end of the window, so a reasonable
>> > default would be "0". Recall that "close" really only needs to be
>> specified
>> > for final results, and a default of 0 would produce the most intuitive
>> > results. If folks later discover that they are missing some late events,
>> > they can adjust the parameter accordingly. IMHO, any other value would
>> just
>> > be a guess on our part.
>> >
>> > 2a:
>> > I think you're saying to re-use "until" instead of adding "close" to the
>> > window.
>> >
>> > The downside here would be that the semantic change could be more
>> confusing
>> > than deprecating "until" and introducing window "close" and a
>> > "retentionTime" on the store builder. The deprecation is a good,
>> controlled
>> > way for us to make sure people are getting the semantics they think
>> they're
>> > getting, as well as giving us an opportunity to link people to the API
>> they
>> > should use instead.
>> >
>> > I didn't fully understand the second part, but it sounds like you're
>> > suggesting to add a new "retentionTime" setter to Windows to bridge the
>> gap
>> > until we add it to the store builder? That seems kind of roundabout to
>> me,
>> > if that's what you meant. We could just immediately add it to the store
>> > builders in the same PR.
>> >
>> > 2b: Sounds good to me!
>> >
>> > Thanks again,
>> > -John
>> >
>> >
>> > On Mon, Jul 9, 2018 at 4:55 PM Guozhang Wang 
>> wrote:
>> >
>> > > John,
>> > >
>> > > Thanks for your replies. As for the two options of the API, I think
>> I'm
>> > > slightl

Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-10 Thread Dong Lin
Hey Anna,

Thanks for the comment. To answer your question, it seems that we can cover
all case in this KIP. As stated in "Consumer Handling" section, KIP-101
based approach will be used to derive the truncation offset from the
2-tuple (offset, leaderEpoch). This approach is best effort and it is
inaccurate only in very rare scenarios (as described in KIP-279).

By using seek(offset, leaderEpoch), consumer will still be able to follow
this best-effort approach to detect log truncation and determine the
truncation offset. On the other hand, if we use seek(offset), consumer will
not detect log truncation in some cases which weakens the guarantee of this
KIP. Does this make sense?

Thanks,
Dong

On Tue, Jul 10, 2018 at 3:14 PM, Anna Povzner  wrote:

> Sorry, I hit "send" before finishing. Continuing...
>
>
> 2) Hiding most of the consumer handling log truncation logic with minimal
> exposure in KafkaConsumer API.  I was proposing this path.
>
>
> Before answering your specific questions… I want to answer to your comment
> “In general, maybe we should discuss the final solution that covers all
> cases?”. With current KIP, we don’t cover all cases of consumer detecting
> log truncation because the KIP proposes a leader epoch cache in consumer
> that does not persist across restarts. Plus, we only store last committed
> offset (either internally or users can store externally). This has a
> limitation that the consumer will not always be able to find point of
> truncation just because we have a limited history (just one data point).
>
>
> So, maybe we should first agree on whether we accept that storing last
> committed offset/leader epoch has a limitation that the consumer will not
> be able to detect log truncation in all cases?
>
>
> Thanks,
>
> Anna
>
> On Tue, Jul 10, 2018 at 2:20 PM Anna Povzner  wrote:
>
> > Hi Dong,
> >
> > Thanks for the follow up! I finally have much more clear understanding of
> > where you are coming from.
> >
> > You are right. The success of findOffsets()/finding a point of
> > non-divergence depends on whether we have enough entries in the
> consumer's
> > leader epoch cache. However, I think this is a fundamental limitation of
> > having a leader epoch cache that does not persist across consumer
> restarts.
> >
> > If we consider the general case where consumer may or may not have this
> > cache, then I see two paths:
> > 1) Letting the user to track the leader epoch history externally, and
> have
> > more exposure to leader epoch and finding point of non-divergence in
> > KafkaConsumer API. I understand this is the case you were talking about.
> >
> >
> >
> > On Tue, Jul 10, 2018 at 12:16 PM Dong Lin  wrote:
> >
> >> Hey Anna,
> >>
> >> Thanks much for your detailed explanation and example! It does help me
> >> understand the difference between our understanding.
> >>
> >> So it seems that the solution based on findOffsets() currently focuses
> >> mainly on the scenario that consumer has cached leaderEpoch -> offset
> >> mapping whereas I was thinking about the general case where consumer may
> >> or
> >> may not have this cache. I guess that is why we have different
> >> understanding here. I have some comments below.
> >>
> >>
> >> 3) The proposed solution using findOffsets(offset, leaderEpoch) followed
> >> by
> >> seek(offset) works if consumer has the cached leaderEpoch -> offset
> >> mapping. But if we assume consumer has this cache, do we need to have
> >> leaderEpoch in the findOffsets(...)? Intuitively, the
> findOffsets(offset)
> >> can also derive the leaderEpoch using offset just like the proposed
> >> solution does with seek(offset).
> >>
> >>
> >> 4) If consumer does not have cached leaderEpoch -> offset mapping, which
> >> is
> >> the case if consumer is restarted on a new machine, then it is not clear
> >> what leaderEpoch would be included in the FetchRequest if consumer does
> >> seek(offset). This is the case that motivates the first question of the
> >> previous email. In general, maybe we should discuss the final solution
> >> that
> >> covers all cases?
> >>
> >>
> >> 5) The second question in my previous email is related to the following
> >> paragraph:
> >>
> >> "... In some cases, offsets returned from position() could be actual
> >> consumed messages by this consumer identified by {offset, leader epoch}.
> >> In
> >> other cases, position() returns offset that was not actually consumed.
> >> Suppose, the user calls position() for the last offset...".
> >>
> >> I guess my point is that, if user calls position() for the last offset
> and
> >> uses that offset in seek(...), then user can probably just call
> >> Consumer#seekToEnd() without calling position() and seek(...). Similarly
> >> user can call Consumer#seekToBeginning() to the seek to the earliest
> >> position without calling position() and seek(...). Thus position() only
> >> needs to return the actual consumed messages identified by {offset,
> leader
> >> epoch}. Does this make sense?
> >>
> >>
>

[jira] [Created] (KAFKA-7148) Kafka load log very slow after goes down with outOfMemoryError

2018-07-10 Thread wang (JIRA)
wang created KAFKA-7148:
---

 Summary: Kafka load log very slow after goes down with 
outOfMemoryError
 Key: KAFKA-7148
 URL: https://issues.apache.org/jira/browse/KAFKA-7148
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.1.1
Reporter: wang


two questions

1、is there any idea to avoid the IOException: Map failed, or is there any limit 
in vm mem size ?

2、is it normal when kafka load one partition log file  cost 20+ secend? 

 

Detail Info:

1、Linux Version :
 kafka_2.11-0.10.1.1> cat /etc/SuSE-release
 SUSE Linux Enterprise Server 11 (x86_64)
 VERSION = 11
 PATCHLEVEL = 3
 
2、VM INFO :4C32G

3、java -version
 java version "1.8.0_131"
 Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
 Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

 

4、Start command :

java -Xmx16G -Xms16G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 
-XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC 
-Djava.awt.headless=true 
-Xloggc:/opt/***/kafka_2.11-0.10.1.1/bin/../logs/kafkaServer-gc.log -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
-Dkafka.logs.dir=/opt/***/kafka_2.11-0.10.1.1/bin/../logs 
-Dlog4j.configuration=file:./../config/log4j.properties -cp :


5、Broker&Topics

we have 3 broker,3 zookeeper
 we have 4 topics in this kafka cluster
 __consumer_offsets 50 partition,3 replicate
 topic1 5 partition,3 replicate
 topic2 160 partition,3 replicate
 topic3 160 partition,3 replicate

Total data disk use :32G
 du -sh data/
 32G data/


6、logs

[2018-07-10 17:23:59,728] FATAL [Replica Manager on Broker 1]: Halting due to 
unrecoverable I/O error while handling produce request: 
(kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log 
'**-13'
 at kafka.log.Log.append(Log.scala:349)
 at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
 at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
 at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
 at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
 at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
 at scala.collection.AbstractTraversable.map(Traversable.scala:104)
 at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
 at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
 at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:436)
 at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
 at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
 at kafka.log.AbstractIndex.(AbstractIndex.scala:61)
 at kafka.log.OffsetIndex.(OffsetIndex.scala:52)
 at kafka.log.LogSegment.(LogSegment.scala:67)
 at kafka.log.Log.roll(Log.scala:778)
 at kafka.log.Log.maybeRoll(Log.scala:744)
 at kafka.log.Log.append(Log.scala:405)
 ... 22 more
Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
 ... 28 more

 

7、then I fllow this 
(https://issues.apache.org/jira/browse/KAFKA-6165?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22sun.nio.ch.FileChannelImpl.map%20Map%20failed%22)

add the vm.max_map_count from default value (65536)->262144
 
 /sbin/sysctl -a |grep map 
 vm.max_map_count = 262144
 
 cat /proc/2860/maps |wc -l
 1195

and change the kafka-run-class.sh replaced `-XX:+DisableExplicitGC` with 
`-XX:+ExplicitGCInvokesConcurrent`


8、but the "IOException: Map failed" problem still exist
 
 then we add vm mem server to 4C64G,and change the -Xmx16G -Xms16G to -Xmx4G 
-Xms4G


9、load log slow log 
 
 [2018-07-10 17:35:33,481] INFO Completed load of log ***-37 with 2 log 
segments and log end offset 2441365 in 15678 ms (kafka.log.Log)
 [2018-07-10 17:35:33,484] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(/opt/***/data/***-34/02451611

[jira] [Created] (KAFKA-7149) Reduce assignment data size to improve kafka streams scalability

2018-07-10 Thread Ashish Surana (JIRA)
Ashish Surana created KAFKA-7149:


 Summary: Reduce assignment data size to improve kafka streams 
scalability
 Key: KAFKA-7149
 URL: https://issues.apache.org/jira/browse/KAFKA-7149
 Project: Kafka
  Issue Type: Improvement
Reporter: Ashish Surana


We observed that when we have high number of partitions, instances or 
stream-threads, assignment-data size grows too fast and we start getting below 
exception at kafka-broker.
RecordTooLargeException
Resolution of this issue is explained at: 
https://issues.apache.org/jira/browse/KAFKA-6976

Still it limits the scalability of kafka streams as moving around 100MBs of 
assignment data for each rebalancing affects performance & reliability (timeout 
exceptions starts appearing) as well. Also this limits kafka streams scale even 
with high max.message.bytes setting as data size increases pretty quickly with 
number of partitions, instances or stream-threads.

 

Solution:

To address this issue in our cluster, we are sending the compressed 
assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
the kafka streams scalability drastically for us and we could now run it with 
more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] 2.0.0 RC2

2018-07-10 Thread Brett Rann
+1 (non binding)
rolling upgrade of tiny shared staging multitenacy (200+ consumer groups)
cluster from 1.1 to 2.0.0-rc1 to 2.0.0-rc2. cluster looks healthy after
upgrade. Lack of burrow lag suggests consumers are still happy, and
incoming messages remains the same.  Will monitor.

On Wed, Jul 11, 2018 at 3:17 AM Rajini Sivaram 
wrote:

> Hello Kafka users, developers and client-developers,
>
>
> This is the third candidate for release of Apache Kafka 2.0.0.
>
>
> This is a major version release of Apache Kafka. It includes 40 new KIPs
> and
>
> several critical bug fixes. Please see the 2.0.0 release plan for more
> details:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820
> 
>
>
> A few notable highlights:
>
> - Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for CreateTopics
> (KIP-277)
> - SASL/OAUTHBEARER implementation (KIP-255)
> - Improved quota communication and customization of quotas (KIP-219,
> KIP-257)
> - Efficient memory usage for down conversion (KIP-283)
> - Fix log divergence between leader and follower during fast leader
> failover (KIP-279)
> - Drop support for Java 7 and remove deprecated code including old scala
> clients
> - Connect REST extension plugin, support for externalizing secrets and
> improved error handling (KIP-285, KIP-297, KIP-298 etc.)
> - Scala API for Kafka Streams and other Streams API improvements
> (KIP-270, KIP-150, KIP-245, KIP-251 etc.)
>
>
> Release notes for the 2.0.0 release:
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/RELEASE_NOTES.html
> 
>
>
> *** Please download, test and vote by Friday, July 13, 4pm PT
>
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
>
> http://kafka.apache.org/KEYS
> 
>
>
> * Release artifacts to be voted upon (source and binary):
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/
> 
>
>
> * Maven artifacts to be voted upon:
>
> https://repository.apache.org/content/groups/staging/
> 
>
>
> * Javadoc:
>
> http://home.apache.org/~rsivaram/kafka-2.0.0-rc2/javadoc/
> 
>
>
> * Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:
>
> https://github.com/apache/kafka/tree/2.0.0-rc2
> 
>
>
>
> * Documentation:
>
> http://kafka.apache.org/20/documentation.html
> 
>
>
> * Protocol:
>
> http://kafka.apache.org/20/protocol.html
> 
>
>
> * Successful Jenkins builds for the 2.0 branch:
>
> Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/72/
> 
>
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/2.0/27/
> 
>
>
> /**
>
>
> Thanks,
>
>
> Rajini
>


-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia

Mobile: +61 (0) 418 826 017


Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-10 Thread Guozhang Wang
That is a good point..

I cannot think of a better option than documentation and warning, and also
given that we'd probably better not reusing the function name `until` for
close time.


Guozhang


On Tue, Jul 10, 2018 at 3:31 PM, John Roesler  wrote:

> I had some opportunity to reflect on the default for close time today...
>
> Note that the current "close time" is equal to the retention time, and
> therefore "close" today shares the default retention of 24h.
>
> It would definitely break any application that today specifies a retention
> time to set close shorter than that time. It's also likely to break apps if
> they *don't* set the retention time and rely on the 24h default. So it's
> unfortunate, but I think if "close" isn't set, we should use the retention
> time instead of a fixed default.
>
> When we ultimately remove the retention time parameter ("until"), we will
> have to set "close" to a default of 24h.
>
> Of course, this has a negative impact on the user of "final results", since
> they won't see any output at all for retentionTime/24h, and may find this
> confusing. What can we do about this except document it well? Maybe log a
> warning if we see that close wasn't explicitly set while using "final
> results"?
>
> Thanks,
> -John
>
> On Tue, Jul 10, 2018 at 10:46 AM John Roesler  wrote:
>
> > Hi Guozhang,
> >
> > That sounds good to me. I'll include that in the KIP.
> >
> > Thanks,
> > -John
> >
> > On Mon, Jul 9, 2018 at 6:33 PM Guozhang Wang  wrote:
> >
> >> Let me clarify a bit on what I meant about moving `retentionPeriod` to
> >> WindowStoreBuilder:
> >>
> >> In another discussion we had around KIP-319 / 330, that the "retention
> >> period" should not really be a window spec, but only a window store
> spec,
> >> as it only affects how long to retain each window to be queryable along
> >> with the storage cost.
> >>
> >> More specifically, today the "maintainMs" returned from Windows is used
> in
> >> three places:
> >>
> >> 1) for windowed aggregations, they are passed in directly into
> >> `Stores.persistentWindows()` as the retention period parameters. For
> this
> >> use case we should just let the WindowStoreBuilder to specify this value
> >> itself.
> >>
> >> NOTE: It is also returned in the KStreamWindowAggregate processor, to
> >> determine if a received record should be dropped due to its lateness. We
> >> may need to think of another way to get this value inside the processor
> >>
> >> 2) for windowed stream-stream join, it is used as the join range
> parameter
> >> but only to check that "windowSizeMs <= retentionPeriodMs". We can do
> this
> >> check at the store builder lever instead of at the processor level.
> >>
> >>
> >> If we can remove its usage in both 1) and 2), then we should be able to
> >> safely remove this from the `Windows` spec.
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Jul 9, 2018 at 3:53 PM, John Roesler  wrote:
> >>
> >> > Thanks for the reply, Guozhang,
> >> >
> >> > Good! I agree, that is also a good reason, and I actually made use of
> >> that
> >> > in my tests. I'll update the KIP.
> >> >
> >> > By the way, I chose "allowedLateness" as I was trying to pick a better
> >> name
> >> > than "close", but I think it's actually the wrong name. We don't want
> to
> >> > bound the lateness of events in general, only with respect to the end
> of
> >> > their window.
> >> >
> >> > If we have a window [0,10), with "allowedLateness" of 5, then if we
> get
> >> an
> >> > event with timestamp 3 at time 9, the name implies we'd reject it,
> which
> >> > seems silly. Really, we'd only want to start rejecting that event at
> >> stream
> >> > time 15.
> >> >
> >> > What I meant was more like "allowedLatenessAfterWindowEnd", but
> that's
> >> too
> >> > verbose. I think that "close" + some documentation about what it means
> >> will
> >> > be better.
> >> >
> >> > 1: "Close" would be measured from the end of the window, so a
> reasonable
> >> > default would be "0". Recall that "close" really only needs to be
> >> specified
> >> > for final results, and a default of 0 would produce the most intuitive
> >> > results. If folks later discover that they are missing some late
> events,
> >> > they can adjust the parameter accordingly. IMHO, any other value would
> >> just
> >> > be a guess on our part.
> >> >
> >> > 2a:
> >> > I think you're saying to re-use "until" instead of adding "close" to
> the
> >> > window.
> >> >
> >> > The downside here would be that the semantic change could be more
> >> confusing
> >> > than deprecating "until" and introducing window "close" and a
> >> > "retentionTime" on the store builder. The deprecation is a good,
> >> controlled
> >> > way for us to make sure people are getting the semantics they think
> >> they're
> >> > getting, as well as giving us an opportunity to link people to the API
> >> they
> >> > should use instead.
> >> >
> >> > I didn't fully understand the second part, but it sounds like you're
> >> > suggesting to add a

[jira] [Resolved] (KAFKA-5991) Change Consumer per partition lag metrics to put topic-partition-id in tags instead of metric name

2018-07-10 Thread Kevin Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Lu resolved KAFKA-5991.
-
Resolution: Fixed

> Change Consumer per partition lag metrics to put topic-partition-id in tags 
> instead of metric name
> --
>
> Key: KAFKA-5991
> URL: https://issues.apache.org/jira/browse/KAFKA-5991
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.2.0, 0.11.0.0
>Reporter: Kevin Lu
>Priority: Minor
>  Labels: metrics
>
> [KIP-92|https://cwiki.apache.org/confluence/display/KAFKA/KIP-92+-+Add+per+partition+lag+metrics+to+KafkaConsumer]
>  brought per partition lag metrics to {{KafkaConsumer}}, but these metrics 
> put the {{TOPIC-PARTITION_ID}} inside of the metric name itself. These 
> metrics should instead utilize the tags and put {{key="topic-partition"}} and 
> {{value=TOPIC-PARTITION_ID}}. 
> Per-broker (node) and per-topic metrics utilize tags in this way by putting 
> {{key="node/topic"}} and {{value=NODE_ID/TOPIC_NAME}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KAFKA-7141) kafka-consumer-group doesn't describe existing group

2018-07-10 Thread huxihx (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxihx reopened KAFKA-7141:
---
  Assignee: huxihx

> kafka-consumer-group doesn't describe existing group
> 
>
> Key: KAFKA-7141
> URL: https://issues.apache.org/jira/browse/KAFKA-7141
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.11.0.0, 1.0.1
>Reporter: Bohdana Panchenko
>Assignee: huxihx
>Priority: Major
>
> I am running two consumers: akka-stream-kafka consumer with standard config 
> section as described in the 
> [https://doc.akka.io/docs/akka-stream-kafka/current/consumer.html] and  
> kafka-console-consumer.
> akka-stream-kafka consumer configuration looks like this
> {color:#33}_akka.kafka.consumer{_{color}
> {color:#33}  _kafka-clients{_{color}
> {color:#33}    _group.id = "myakkastreamkafka-1"_{color}
> {color:#33}   _enable.auto.commit = false_{color}
> }
> {color:#33} }{color}
>  
>  I am able to see the both groups with the command
>  
>  *kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --list*
>  _Note: This will not show information about old Zookeeper-based consumers._
>  
>  _myakkastreamkafka-1_
>  _console-consumer-57171_
> {color:#33}I am able to view details about the console consumer 
> group{color}
> *kafka-consumer-groups --describe --bootstrap-server 127.0.0.1:9092 --group 
> console-consumer-57171*
>  _{color:#205081}Note: This will not show information about old 
> Zookeeper-based consumers.{color}_
> _{color:#205081}TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID 
> HOST CLIENT-ID{color}_
>  _{color:#205081}STREAM-TEST 0 0 0 0 
> consumer-1-6b928e07-196a-4322-9928-068681617878 /172.19.0.4 consumer-1{color}_
> {color:#33}But the command to describe my akka stream consumer gives me 
> empty output:{color}
> *kafka-consumer-groups --describe --bootstrap-server 127.0.0.1:9092 --group 
> myakkastreamkafka-1*
>  {color:#205081}_Note: This will not show information about old 
> Zookeeper-based consumers._{color}
> {color:#205081}_TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID 
> HOST CLIENT-ID_{color}
>  
> {color:#33}That is strange. Can you please check the issue?{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Kafka Streams new release

2018-07-10 Thread Ayushi Sharma
When is the next Kafka Streams release (Kafka Streams 2.0.0) which was
tentatively on 26June?


Re: Requesting Permission To Create KIP And Assign JIRAs

2018-07-10 Thread Kevin Lu
Hi,

I received access to assign JIRAs now, but I still cannot create KIPs.

When I hover over the "Create KIP" button, it says I do not have permission
to create content. When I click on the button, it goes a 404 Not Found page.

Can someone add me?

Thanks!

Regards,
Kevin

On Tue, Jun 26, 2018 at 10:37 PM Kevin Lu  wrote:

> Hi All,
>
> I would like to start contributing to Kafka but I do not have access to
> create KIPs or assign JIRA to myself.
>
> Can someone set it up for me?
>
> Confluence id: lu.kevin
> Jira username: lu.kevin
>
> Email: lu.ke...@berkeley.edu
>
> Thanks!
>
> Regards,
> Kevin
>


-- 
Kevin Li Lu
University of California, Berkeley | Class of 2017
College of Letters & Sciences | B.A. Computer Science
Cell: (408) 609-6238


Re: Kafka Streams new release

2018-07-10 Thread Ofir Manor
>From the release plan:
  " While the target release date is fixed at ~2w after code freeze, RCs
will roll out as needed until the release vote passes"
   https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820
You can follow the voting threads on this mailing list - the current thread
is "[VOTE] 2.0.0 RC2"
Once a vote for RC passes, that RC will be released as the 2.0.0 version.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Wed, Jul 11, 2018 at 9:21 AM, Ayushi Sharma  wrote:

> When is the next Kafka Streams release (Kafka Streams 2.0.0) which was
> tentatively on 26June?
>