[jira] [Resolved] (KAFKA-2561) Optionally support OpenSSL for SSL/TLS

2022-02-13 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2561.

Resolution: Won't Do

> Optionally support OpenSSL for SSL/TLS 
> ---
>
> Key: KAFKA-2561
> URL: https://issues.apache.org/jira/browse/KAFKA-2561
> Project: Kafka
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ismael Juma
>Priority: Major
>
> JDK's `SSLEngine` is unfortunately a bit slow (KAFKA-2431 covers this in more 
> detail). We should consider supporting OpenSSL for SSL/TLS. Initial 
> experiments on my laptop show that it performs a lot better:
> {code}
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, config
> 2015-09-21 14:41:58:245, 2015-09-21 14:47:02:583, 28610.2295, 94.0081, 
> 3000, 98574.6111, Java 8u60/server auth JDK 
> SSLEngine/TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
> 2015-09-21 14:38:24:526, 2015-09-21 14:40:19:941, 28610.2295, 247.8900, 
> 3000, 259931.5514, Java 8u60/server auth 
> OpenSslEngine/TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2015-09-21 14:49:03:062, 2015-09-21 14:50:27:764, 28610.2295, 337.7751, 
> 3000, 354182.9000, Java 8u60/plaintext
> {code}
> Extracting the throughput figures:
> * JDK SSLEngine: 94 MB/s
> * OpenSSL SSLEngine: 247 MB/s
> * Plaintext: 337 MB/s (code from trunk, so no zero-copy due to KAFKA-2517)
> In order to get these figures, I used Netty's `OpenSslEngine` by hacking 
> `SSLFactory` to use Netty's `SslContextBuilder` and made a few changes to 
> `SSLTransportLayer` in order to workaround differences in behaviour between 
> `OpenSslEngine` and JDK's SSLEngine (filed 
> https://github.com/netty/netty/issues/4235 and 
> https://github.com/netty/netty/issues/4238 upstream).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-7304) memory leakage in org.apache.kafka.common.network.Selector

2022-02-13 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-7304.

Fix Version/s: 2.0.1
 Assignee: Rajini Sivaram
   Resolution: Fixed

Marking as fixed since the leaks were fixed.

> memory leakage in org.apache.kafka.common.network.Selector
> --
>
> Key: KAFKA-7304
> URL: https://issues.apache.org/jira/browse/KAFKA-7304
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Yu Yang
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: 7304.v4.txt, 7304.v7.txt, Screen Shot 2018-08-16 at 
> 11.04.16 PM.png, Screen Shot 2018-08-16 at 11.06.38 PM.png, Screen Shot 
> 2018-08-16 at 12.41.26 PM.png, Screen Shot 2018-08-16 at 4.26.19 PM.png, 
> Screen Shot 2018-08-17 at 1.03.35 AM.png, Screen Shot 2018-08-17 at 1.04.32 
> AM.png, Screen Shot 2018-08-17 at 1.05.30 AM.png, Screen Shot 2018-08-28 at 
> 11.09.45 AM.png, Screen Shot 2018-08-29 at 10.49.03 AM.png, Screen Shot 
> 2018-08-29 at 10.50.47 AM.png, Screen Shot 2018-09-29 at 10.38.12 PM.png, 
> Screen Shot 2018-09-29 at 10.38.38 PM.png, Screen Shot 2018-09-29 at 8.34.50 
> PM.png
>
>
> We are testing secured writing to kafka through ssl. Testing at small scale, 
> ssl writing to kafka was fine. However, when we enabled ssl writing at a 
> larger scale (>40k clients write concurrently), the kafka brokers soon hit 
> OutOfMemory issue with 4G memory setting. We have tried with increasing the 
> heap size to 10Gb, but encountered the same issue. 
> We took a few heap dumps , and found that most of the heap memory is 
> referenced through org.apache.kafka.common.network.Selector objects.  There 
> are two Channel maps field in Selector. It seems that somehow the objects is 
> not deleted from the map in a timely manner. 
> One observation is that the memory leak seems relate to kafka partition 
> leader changes. If there is broker restart etc. in the cluster that caused 
> partition leadership change, the brokers may hit the OOM issue faster. 
> {code}
> private final Map channels;
> private final Map closingChannels;
> {code}
> Please see the  attached images and the following link for sample gc 
> analysis. 
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDgvMTcvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS0yLTM5LTM0
> the command line for running kafka: 
> {code}
> java -Xms10g -Xmx10g -XX:NewSize=512m -XX:MaxNewSize=512m 
> -Xbootclasspath/p:/usr/local/libs/bcp -XX:MetaspaceSize=128m -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=25 -XX:InitiatingHeapOccupancyPercent=35 
> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=25 
> -XX:MaxMetaspaceFreeRatio=75 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintTenuringDistribution -Xloggc:/var/log/kafka/gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=40 -XX:GCLogFileSize=50M 
> -Djava.awt.headless=true 
> -Dlog4j.configuration=file:/etc/kafka/log4j.properties 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.port= 
> -Dcom.sun.management.jmxremote.rmi.port= -cp /usr/local/libs/*  
> kafka.Kafka /etc/kafka/server.properties
> {code}
> We use java 1.8.0_102, and has applied a TLS patch on reducing 
> X509Factory.certCache map size from 750 to 20. 
> {code}
> java -version
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-4231) ZK metadata went inconsistent when migrating 0.10.0.0 -> 0.10.0.1

2022-02-13 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4231.

Resolution: Cannot Reproduce

> ZK metadata went inconsistent when migrating 0.10.0.0 -> 0.10.0.1
> -
>
> Key: KAFKA-4231
> URL: https://issues.apache.org/jira/browse/KAFKA-4231
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
> Environment: 3-node cluster on AWS, 700+ topics, created and dropped 
> on regular basis, we use Java Kafka client 0.9.0.1 and ZKUtils 0.9.0.1 to 
> create/drop topics automatically
>Reporter: Sergey Alaev
>Priority: Major
>
> 1. we updated our Kafka cluster 0.10.0.0 -> 0.10.0.1 6 days ago
> 2. yesterday, we got huge amount of server logs looking like:
> [2016-09-29 12:33:27,055] ERROR [ReplicaFetcherThread-0-1], Error for 
> partition 
> [test-customer-recipient-4115239346516610_test-recipient-4115239346516610,0] 
> to broker 1:org.apache.kafka.common.errors
> .UnknownTopicOrPartitionException: This server does not host this 
> topic-partition. (kafka.server.ReplicaFetcherThread)
> 3. Today I've restarted all three nodes and they failed to start throwing 
> exception featured below.
> Investigation showed that ZK:
>  /config/topics contained 729 entries
> /admin/delete_topics contained 484 entries, all of them present in 
> /config/topics
> /brokers/topics were missing 6 entries present in /config/topics! This was 
> the cause of startup failure
> Removing those 6 entries from /config/topics fixed this issue.
> I'm sure that we didn't change ZK data manually, the only possible culpits 
> are kafka-server 0.10.0.1 and ZKUtils 0.9.0.1 (used only to delete topics)
> [2016-09-29 12:52:11,082] ERROR Error while electing or becoming leader on 
> broker 2 (kafka.server.ZookeeperLeaderElector)
> kafka.common.KafkaException: Can't parse json string: null
> at kafka.utils.Json$.liftedTree1$1(Json.scala:40)
> at kafka.utils.Json$.parseFull(Json.scala:36)
> at 
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:610)
> at 
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:606)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> kafka.utils.ZkUtils.getReplicaAssignmentForTopics(ZkUtils.scala:606)
> at 
> kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:744)
> at 
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
> at 
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
> at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
> at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: java.lang.NullPointerException
> Server configuration:
> [2016-09-29 12:40:58,190] INFO KafkaConfig values:
> advertised.host.name = null
> metric.reporters = []
> quota.producer.default = 9223372036854775807
> offsets.topic.num.partitions = 50
> log.flush.interval.messages = 9223372036854775807
> auto.create.topics.enable = true
> controller.socket.timeout.ms = 3
> log.flush.interval.ms = null
> principal.builder.class = class 
> org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
> replica.socket.receive.buffer.bytes = 65536
> min.insync.replicas = 2
> replica.fetch.wait.max.ms = 500
> num.recovery.threads.per.data.dir = 1
> ssl.keystore.type = JKS
> sasl.mechanism.inter.broker.protocol = GSSAPI
> default.replication.factor = 3
> ssl.truststore.password = [hidden]
> log.preallocate = false
> sasl.kerberos.principal.to.local.rules = [DEFAULT]
> fetch.purgatory.purge.interval.requests = 1000
> ssl.endpoint.identification.algorithm = null
> replica.socket.timeout.ms = 3
>  

[jira] [Resolved] (KAFKA-6706) NETWORK_EXCEPTION and REQUEST_TIMED_OUT in mirrormaker producer after 1.0 broker upgrade

2022-02-13 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-6706.

Fix Version/s: 1.1.1
   Resolution: Fixed

> NETWORK_EXCEPTION and REQUEST_TIMED_OUT in mirrormaker producer after 1.0 
> broker upgrade
> 
>
> Key: KAFKA-6706
> URL: https://issues.apache.org/jira/browse/KAFKA-6706
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 1.0.0
>Reporter: Di Shang
>Priority: Blocker
>  Labels: mirror-maker
> Fix For: 1.1.1
>
>
> We have 2 clusters A and B with 4 brokers each, we use mirrormaker to 
> replicate topics from A to B. 
>  We recently upgraded our brokers from 0.10.2.0 to 1.0.0, after the upgrade 
> we started seeing the mirrormaker task showing producer errors and 
> intermittently dying. 
>  We tried using 1.0.0 and 0.10.2.0 mirrormaker, both have the same problem. 
> Downgrading cluster B brokers back to 0.10.2.0 and the problem went away, so 
> we think it's a server side problem.
> There are 2 types of errors: REQUEST_TIMED_OUT and NETWORK_EXCEPTION. For 
> testing, I used a topic *logging* with 20 partitions and 3 replicas (same on 
> cluster A and B), the source topic has 50+ million msg.
> (this is from mirrormaker 1.0 at info level, the 0.10.2.0 log is very similar)
> {noformat}
> 22 Mar 2018 02:16:07.407 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 35122 on 
> topic-partition logging-7, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:17:49.731 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 51572 on 
> topic-partition logging-7, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:18:33.903 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 57785 on 
> topic-partition logging-5, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:21:21.399 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 85406 on 
> topic-partition logging-18, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:25:22.278 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 128047 on 
> topic-partition logging-5, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:26:17.154 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 137049 on 
> topic-partition logging-18, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:27:57.358 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 153976 on 
> topic-partition logging-5, retrying (2147483646 attempts left). Error: 
> REQUEST_TIMED_OUT
>  22 Mar 2018 02:27:57.779 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 154077 on 
> topic-partition logging-2, retrying (2147483646 attempts left). Error: 
> NETWORK_EXCEPTION
>  22 Mar 2018 02:27:57.780 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 154077 on 
> topic-partition logging-10, retrying (2147483646 attempts left). Error: 
> NETWORK_EXCEPTION
>  22 Mar 2018 02:27:57.780 [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.producer.internals.Sender warn(line:251) [Producer 
> clientId=producer-1] Got error produce response with correlation id 154077 on 
> topic-partition logging-18, retrying (2147483646 attempts left). Error: 
> NETWORK_EXCEPTION
>  22 Mar 2018 02:27:57.781 [kafka-producer-network-thread | producer-1] WARN 
> org.apach

Re: [DISCUSS] Should we automatically close stale PRs?

2022-02-13 Thread Luke Chen
Hi Guozhang,

Had a quick search on the github action, didn't find any notification
related actions.
However, there's a github action to auto leave a comment before closing
PR.[1]
So, I think at least, we can leave a comment that notify the PR
participants.
If the auto comment action can support mention users (i.e. @ user), we can
notify anyone we want.

[1] https://github.com/actions/stale#close-pr-message

Thanks.
Luke


On Fri, Feb 11, 2022 at 6:36 AM Guozhang Wang  wrote:

> Just going back to the PRs, @David Jacot, do you know if the actions/stale
>  tool is able to send notifications to
> pre-configured recipients when closing stale PRs?
>
> On Wed, Feb 9, 2022 at 9:21 PM Matthias J. Sax 
> wrote:
>
> > Nikolay,
> >
> > thanks for helping out!
> >
> > > First, I thought it’s an author job to keep KIP status up to date.
> > > But, it can be tricky to determine actual KIP status because of lack of
> > feedback from committers
> >
> > Yes, it is the author's task, but it's also the author's task to keep
> > the discussion alive (what -- to be fair -- can be hard). We had some
> > great contributions thought that took very long, but the KIP author kept
> > following up and thus signaling that they still have interest. Just
> > going silent and effectively dropping a KIP is not the best way (even if
> > I can understand that it sometime frustrating and some people just walk
> > away).
> >
> >
> > > Second - the other issue is determine - what KIP just wait for a hero
> to
> > implement it, and what just wrong idea or something similar.
> >
> > As pointed out on the KIP wiki page, if somebody is not willing to
> > implement the KIP, they should not even start it. Getting a KIP voted
> > but not finishing it, is not really helping the project.
> >
> > About "just the wrong idea": this also happens, but usually it's clear
> > quite quickly if people raise concerns about an idea.
> >
> >
> > -Matthias
> >
> >
> > On 2/9/22 12:13, Nikolay Izhikov wrote:
> > >> Thanks for that list, Nikolay,
> > >
> > > Thank, John.
> > >
> > > I made a second round of digging through abandoned PR’s.
> > > Next pack, that should be closed:
> > >
> > > https://github.com/apache/kafka/pull/1291
> > > https://github.com/apache/kafka/pull/1323
> > > https://github.com/apache/kafka/pull/1412
> > > https://github.com/apache/kafka/pull/1757
> > > https://github.com/apache/kafka/pull/1741
> > > https://github.com/apache/kafka/pull/1715
> > > https://github.com/apache/kafka/pull/1668
> > > https://github.com/apache/kafka/pull/1666
> > > https://github.com/apache/kafka/pull/1661
> > > https://github.com/apache/kafka/pull/1626
> > > https://github.com/apache/kafka/pull/1624
> > > https://github.com/apache/kafka/pull/1608
> > > https://github.com/apache/kafka/pull/1606
> > > https://github.com/apache/kafka/pull/1582
> > > https://github.com/apache/kafka/pull/1522
> > > https://github.com/apache/kafka/pull/1516
> > > https://github.com/apache/kafka/pull/1493
> > > https://github.com/apache/kafka/pull/1473
> > > https://github.com/apache/kafka/pull/1870
> > > https://github.com/apache/kafka/pull/1883
> > > https://github.com/apache/kafka/pull/1893
> > > https://github.com/apache/kafka/pull/1894
> > > https://github.com/apache/kafka/pull/1912
> > > https://github.com/apache/kafka/pull/1933
> > > https://github.com/apache/kafka/pull/1983
> > > https://github.com/apache/kafka/pull/1984
> > > https://github.com/apache/kafka/pull/2017
> > > https://github.com/apache/kafka/pull/2018
> > >
> > >> 9 февр. 2022 г., в 22:37, John Roesler 
> > написал(а):
> > >>
> > >> Thanks for that list, Nikolay,
> > >>
> > >> I've just closed them all.
> > >>
> > >> And thanks to you all for working to keep Kafka development
> > >> healthy!
> > >>
> > >> -John
> > >>
> > >> On Wed, 2022-02-09 at 14:19 +0300, Nikolay Izhikov wrote:
> > >>> Hello, guys.
> > >>>
> > >>> I made a quick search throw oldest PRs.
> > >>> Looks like the following list can be safely closed.
> > >>>
> > >>> Committers, can you, please, push the actual «close» button for this
> > list of PRs?
> > >>>
> > >>> https://github.com/apache/kafka/pull/560
> > >>> https://github.com/apache/kafka/pull/200
> > >>> https://github.com/apache/kafka/pull/62
> > >>> https://github.com/apache/kafka/pull/719
> > >>> https://github.com/apache/kafka/pull/735
> > >>> https://github.com/apache/kafka/pull/757
> > >>> https://github.com/apache/kafka/pull/824
> > >>> https://github.com/apache/kafka/pull/880
> > >>> https://github.com/apache/kafka/pull/907
> > >>> https://github.com/apache/kafka/pull/983
> > >>> https://github.com/apache/kafka/pull/1035
> > >>> https://github.com/apache/kafka/pull/1078
> > >>> https://github.com/apache/kafka/pull/
> > >>> https://github.com/apache/kafka/pull/1135
> > >>> https://github.com/apache/kafka/pull/1147
> > >>> https://github.com/apache/kafka/pull/1150
> > >>> https://github.com/apache/kafka/pull/1244
> > >>> https://github.com/apache/ka

Re: [DISCUSS] Should we automatically close stale PRs?

2022-02-13 Thread Guozhang Wang
Thanks for the investigation, Luke!

I think just pining committer's ids in a comment is sufficient.

On Sun, Feb 13, 2022 at 5:00 AM Luke Chen  wrote:

> Hi Guozhang,
>
> Had a quick search on the github action, didn't find any notification
> related actions.
> However, there's a github action to auto leave a comment before closing
> PR.[1]
> So, I think at least, we can leave a comment that notify the PR
> participants.
> If the auto comment action can support mention users (i.e. @ user), we can
> notify anyone we want.
>
> [1] https://github.com/actions/stale#close-pr-message
>
> Thanks.
> Luke
>
>
> On Fri, Feb 11, 2022 at 6:36 AM Guozhang Wang  wrote:
>
> > Just going back to the PRs, @David Jacot, do you know if the
> actions/stale
> >  tool is able to send notifications to
> > pre-configured recipients when closing stale PRs?
> >
> > On Wed, Feb 9, 2022 at 9:21 PM Matthias J. Sax  >
> > wrote:
> >
> > > Nikolay,
> > >
> > > thanks for helping out!
> > >
> > > > First, I thought it’s an author job to keep KIP status up to date.
> > > > But, it can be tricky to determine actual KIP status because of lack
> of
> > > feedback from committers
> > >
> > > Yes, it is the author's task, but it's also the author's task to keep
> > > the discussion alive (what -- to be fair -- can be hard). We had some
> > > great contributions thought that took very long, but the KIP author
> kept
> > > following up and thus signaling that they still have interest. Just
> > > going silent and effectively dropping a KIP is not the best way (even
> if
> > > I can understand that it sometime frustrating and some people just walk
> > > away).
> > >
> > >
> > > > Second - the other issue is determine - what KIP just wait for a hero
> > to
> > > implement it, and what just wrong idea or something similar.
> > >
> > > As pointed out on the KIP wiki page, if somebody is not willing to
> > > implement the KIP, they should not even start it. Getting a KIP voted
> > > but not finishing it, is not really helping the project.
> > >
> > > About "just the wrong idea": this also happens, but usually it's clear
> > > quite quickly if people raise concerns about an idea.
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 2/9/22 12:13, Nikolay Izhikov wrote:
> > > >> Thanks for that list, Nikolay,
> > > >
> > > > Thank, John.
> > > >
> > > > I made a second round of digging through abandoned PR’s.
> > > > Next pack, that should be closed:
> > > >
> > > > https://github.com/apache/kafka/pull/1291
> > > > https://github.com/apache/kafka/pull/1323
> > > > https://github.com/apache/kafka/pull/1412
> > > > https://github.com/apache/kafka/pull/1757
> > > > https://github.com/apache/kafka/pull/1741
> > > > https://github.com/apache/kafka/pull/1715
> > > > https://github.com/apache/kafka/pull/1668
> > > > https://github.com/apache/kafka/pull/1666
> > > > https://github.com/apache/kafka/pull/1661
> > > > https://github.com/apache/kafka/pull/1626
> > > > https://github.com/apache/kafka/pull/1624
> > > > https://github.com/apache/kafka/pull/1608
> > > > https://github.com/apache/kafka/pull/1606
> > > > https://github.com/apache/kafka/pull/1582
> > > > https://github.com/apache/kafka/pull/1522
> > > > https://github.com/apache/kafka/pull/1516
> > > > https://github.com/apache/kafka/pull/1493
> > > > https://github.com/apache/kafka/pull/1473
> > > > https://github.com/apache/kafka/pull/1870
> > > > https://github.com/apache/kafka/pull/1883
> > > > https://github.com/apache/kafka/pull/1893
> > > > https://github.com/apache/kafka/pull/1894
> > > > https://github.com/apache/kafka/pull/1912
> > > > https://github.com/apache/kafka/pull/1933
> > > > https://github.com/apache/kafka/pull/1983
> > > > https://github.com/apache/kafka/pull/1984
> > > > https://github.com/apache/kafka/pull/2017
> > > > https://github.com/apache/kafka/pull/2018
> > > >
> > > >> 9 февр. 2022 г., в 22:37, John Roesler 
> > > написал(а):
> > > >>
> > > >> Thanks for that list, Nikolay,
> > > >>
> > > >> I've just closed them all.
> > > >>
> > > >> And thanks to you all for working to keep Kafka development
> > > >> healthy!
> > > >>
> > > >> -John
> > > >>
> > > >> On Wed, 2022-02-09 at 14:19 +0300, Nikolay Izhikov wrote:
> > > >>> Hello, guys.
> > > >>>
> > > >>> I made a quick search throw oldest PRs.
> > > >>> Looks like the following list can be safely closed.
> > > >>>
> > > >>> Committers, can you, please, push the actual «close» button for
> this
> > > list of PRs?
> > > >>>
> > > >>> https://github.com/apache/kafka/pull/560
> > > >>> https://github.com/apache/kafka/pull/200
> > > >>> https://github.com/apache/kafka/pull/62
> > > >>> https://github.com/apache/kafka/pull/719
> > > >>> https://github.com/apache/kafka/pull/735
> > > >>> https://github.com/apache/kafka/pull/757
> > > >>> https://github.com/apache/kafka/pull/824
> > > >>> https://github.com/apache/kafka/pull/880
> > > >>> https://github.com/apache/kafka/pull/907
> > > >>> https://github.c

Re: [DISCUSS] KIP-822: Optimize the semantics of KafkaConsumer#pause to be consistent between the two RebalanceProtocols

2022-02-13 Thread Guozhang Wang
Hello Riven,


Thanks for bringing this proposal. As we discussed on the JIRA I'm
personally in favor of this fix. But if all the proposed changes are in
`ConsumerCoordinator`, then we do not need a KIP since that class is
internal only.


Guozhang

On Sat, Feb 12, 2022 at 1:35 AM Riven Sun  wrote:

> Sorry, I sent this email via GMail. Refer to the contents of other people's
> DISSCUSS emails. Mistakenly introduced someone else's KIP.
>
> The KIP related to this DISCUSS is
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199534763
>
> Thank you for your kindness
> RivenSun
>
> On Sat, Feb 12, 2022 at 5:32 PM Riven Sun  wrote:
>
> >
> >> Sorry, I sent this email via GMail. Refer to the contents of other
> >> people's DISSCUSS emails. Mistakenly introduced someone else's KIP.
> >>
> >> The KIP related to this DISCUSS is
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199534763
> >>
> >> Thank you for your kindness
> >> RivenSun
> >>
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2022-02-13 Thread Guozhang Wang
Hi Sagar,

Looks good to define it in the NamedCacheMetrics. Though since this is an
internal implementation detail and neither of the classes are public, we do
not actually need to define it in the KIP :)


Guozhang

On Sat, Feb 12, 2022 at 4:22 AM Sagar  wrote:

> @Guozhang,
>
> I have sent an update to this KIP. I have a question though.. Should this
> new metric be defined in TaskMetrics level or NamedCacheMetrics? I think
> the latter makes sense as that holds the cache size at a task level and
> exposes some other cache related metrics as well like hit-ratio.
>
> Thanks!
> Sagar.
>
>
> On Sat, Feb 12, 2022 at 1:14 PM Sagar  wrote:
>
> > Hi All,
> >
> > There's another amendment proposed for this KIP. We are adding a new
> > metric type called *cache-size-bytes-total  *to capture the cache size in
> > bytes accumulated by a task.
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, Jan 24, 2022 at 7:55 AM Guozhang Wang 
> wrote:
> >
> >> Thanks Sagar, I'm +1 on the renamed metric.
> >>
> >> On Sat, Jan 22, 2022 at 6:56 PM Sagar 
> wrote:
> >>
> >> > Hi All,
> >> >
> >> > There is a small update to the KIP whereby the newly introduced metric
> >> > *total-bytes
> >> > *has been renamed to *input-buffer-bytes-total.*
> >> >
> >> > Thanks!
> >> > Sagar.
> >> >
> >> > On Wed, Sep 29, 2021 at 9:57 AM Sagar 
> >> wrote:
> >> >
> >> > > We have 3 binding votes: Sophie/Guozhang/Mathias
> >> > > and 2 non-binding votes: Josep/Luke and no -1 votes.
> >> > >
> >> > > Marking this KIP as accepted.
> >> > >
> >> > > Thanks everyone!
> >> > >
> >> > > Thanks!
> >> > > Sagar.
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Sep 29, 2021 at 7:18 AM Matthias J. Sax 
> >> > wrote:
> >> > >
> >> > >> +1 (binding)
> >> > >>
> >> > >> On 9/28/21 10:40 AM, Sagar wrote:
> >> > >> > Hi All,
> >> > >> >
> >> > >> > Bumping this vote thread again!
> >> > >> >
> >> > >> > Thanks!
> >> > >> > Sagar.
> >> > >> >
> >> > >> > On Wed, Sep 8, 2021 at 1:19 PM Luke Chen 
> >> wrote:
> >> > >> >
> >> > >> >> Thanks for the KIP.
> >> > >> >>
> >> > >> >> + 1 (non-binding)
> >> > >> >>
> >> > >> >> Thanks.
> >> > >> >> Luke
> >> > >> >>
> >> > >> >> On Wed, Sep 8, 2021 at 2:48 PM Josep Prat
> >> >  >> > >> >
> >> > >> >> wrote:
> >> > >> >>
> >> > >> >>> +1 (non binding).
> >> > >> >>>
> >> > >> >>> Thanks for the KIP Sagar!
> >> > >> >>> ———
> >> > >> >>> Josep Prat
> >> > >> >>>
> >> > >> >>> Aiven Deutschland GmbH
> >> > >> >>>
> >> > >> >>> Immanuelkirchstraße 26, 10405 Berlin
> >> > >> >>>
> >> > >> >>> Amtsgericht Charlottenburg, HRB 209739 B
> >> > >> >>>
> >> > >> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> > >> >>>
> >> > >> >>> m: +491715557497
> >> > >> >>>
> >> > >> >>> w: aiven.io
> >> > >> >>>
> >> > >> >>> e: josep.p...@aiven.io
> >> > >> >>>
> >> > >> >>> On Wed, Sep 8, 2021, 03:29 Sophie Blee-Goldman
> >> > >> >>  >> > >> 
> >> > >> >>> wrote:
> >> > >> >>>
> >> > >>  +1 (binding)
> >> > >> 
> >> > >>  Thanks for the KIP!
> >> > >> 
> >> > >>  -Sophie
> >> > >> 
> >> > >>  On Tue, Sep 7, 2021 at 1:59 PM Guozhang Wang <
> >> wangg...@gmail.com>
> >> > >> >> wrote:
> >> > >> 
> >> > >> > Thanks Sagar, +1 from me.
> >> > >> >
> >> > >> >
> >> > >> > Guozhang
> >> > >> >
> >> > >> > On Sat, Sep 4, 2021 at 10:29 AM Sagar <
> >> sagarmeansoc...@gmail.com>
> >> > >> >>> wrote:
> >> > >> >
> >> > >> >> Hi All,
> >> > >> >>
> >> > >> >> I would like to start a vote on the following KIP:
> >> > >> >>
> >> > >> >>
> >> > >> >
> >> > >> 
> >> > >> >>>
> >> > >> >>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> >> > >> >>
> >> > >> >> Thanks!
> >> > >> >> Sagar.
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >> > --
> >> > >> > -- Guozhang
> >> > >> >
> >> > >> 
> >> > >> >>>
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>


-- 
-- Guozhang


[GitHub] [kafka-site] showuon merged pull request #396: MINOR: Add CVE-2022-23302 and CVE-2022-23305 to cve-list

2022-02-13 Thread GitBox


showuon merged pull request #396:
URL: https://github.com/apache/kafka-site/pull/396


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] KIP-812: Introduce another form of the `KafkaStreams.close()` API that forces the member to leave the consumer group

2022-02-13 Thread Guozhang Wang
Thanks Seung-chan!

Re: "As I read the code, we can set a consumer to leave the group while
shutting
down the thread, using `StreamThread#requestLeaveGroupDuringShutdown`
method. Is it enough to call that method on every thread created in
`KafkaStream` sometime before we call `StreamThread#shutdown`?"

I think you are right. We can bypass API changes in consumers for this KIP.

Re: "I updated the `Public Interfaces` section in the KIP to specify full
signature. As you can see, I suggested putting the `CloseOptions` class in
the `KafkaStream` class. I feel like it'd be too general name to put it in
a separate file."

Re-read the updated KIP, looks good to me!


Guozhang

On Fri, Feb 11, 2022 at 11:21 PM Seung-chan Ahn 
wrote:

> (re-sending with the better syntax of quotation)
>
> Hello Guozhang,
>
> Thanks for your rich comment! I'd carefully read through all you mentioned.
> I updated the `Public Interfaces` section of KIP and this is what I think:
>
> @Guozhang: Today we use the timeout to try to tackle all three cases, but
> > ideally we want the client to submit extra information to help
> distinguish
> > them. I.e. we just use timeout for case 1) only, while we use separate
> > mechanisms to differentiate 2) and 3) from it. Personally I think we
> could
> > consider having an augmented leave-group (or maybe in the long run, we
> can
> > merge that RPC as part of heartbeat) with a flag indicating 2) or 3),
> while
> > just relying on the timeout for case 1).
>
>
> I know you proposed this idea in a wider scope than this KIP, but it'd be
> worth keeping the discussion. I've thought about the idea of `augmented
> leave-group with a flag indicating 2) or 3)`. In the case that a bouncing
> consumer requested, with a flag, to leave the group, and unfortunately, it
> failed to restart, I guess the group’s coordinator still needs to drop the
> consumer after some while. And by its nature, the coordinator would wait
> for the consumer till the timeout reached. Eventually, it seems like not
> really different from the case the consumer restarts and prays the timeout
> is enough. In my very naive thought, `augmented leave-group with a flag
> indicating 2)` is not supposed to be a request to leave the group but the
> one for being exempted from the timeout. So I’d rather consider having a
> request to extend the timeout for one time instead.
>
> @Guozhang: 1. Regarding the API change, I feel just doing that on the
> > streams side is not enough since by the end of the day we still need the
> > consumer to incorporate it (today it's via a static config and hence we
> > cannot just dynamically change the config).
>
>
> As I read the code, we can set a consumer to leave the group while shutting
> down the thread, using `StreamThread#requestLeaveGroupDuringShutdown`
> method. Is it enough to call that method on every thread created in
> `KafkaStream` sometime before we call `StreamThread#shutdown`?
>
> @Guozhang: 2. Regarding the API itself, I want to see a more concrete
> > proposal that contains the full signature, e.g.
> does".closeAndLeaveGroup()"
> > include the timeout param as well, etc? My very subjective preference is
> to
> > not differentiate by the function name in case we may want to augment the
> > close function in the future, which would explode the function names :P
> > Instead maybe we can just overload the `close()` function again but with
> a
> > control object, that includes 1) timeout, 2) leave-group flag, and hence
> > can also extend to include more variables just in case. Would like to
> hear
> > others' thoughts as well.
>
>
> I personally prefer the control object pattern too. It will save us from
> the "telescoping constructors" pattern. Also, I found that we already
> introduced this way on `AdminClient`. It sounds consistent to have the same
> pattern in this case.
>
> I updated the `Public Interfaces` section in the KIP to specify full
> signature. As you can see, I suggested putting the `CloseOptions` class in
> the `KafkaStream` class. I feel like it'd be too general name to put it in
> a separate file.
>
> I’m fully open :) Feel free to oppose any.
>
> On Mon, Feb 7, 2022 at 12:50 PM Guozhang Wang  wrote:
>
> > Hello Seung-chan,
> >
> > Thanks for the KIP writeup and summary! I made a pass on it and want to
> > share some of my thoughts:
> >
> > On the very high level, we want to be able to effectively differentiate
> > several cases as follows:
> >
> > 1) There's a network partition / soft failure hence clients cannot reach
> > the broker, temporarily: here we want to give some time to see if the
> > clients can reconnect back, and hence the timeout makes sense.
> > 2) The client is being bounced, i.e. a shutdown followed by a restart:
> here
> > we do not want to trigger any rebalance, but today we can only hope that
> > the timeout is long enough to cover that bounce window.
> > 3) The client is shutdown and won't be back: here we want to trigger the
> > rebalance immediately, but

Re: [VOTE] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2022-02-13 Thread Sagar
Thanks Guozhang.

Thanks!
Sagar.

On Mon, Feb 14, 2022 at 7:40 AM Guozhang Wang  wrote:

> Hi Sagar,
>
> Looks good to define it in the NamedCacheMetrics. Though since this is an
> internal implementation detail and neither of the classes are public, we do
> not actually need to define it in the KIP :)
>
>
> Guozhang
>
> On Sat, Feb 12, 2022 at 4:22 AM Sagar  wrote:
>
> > @Guozhang,
> >
> > I have sent an update to this KIP. I have a question though.. Should this
> > new metric be defined in TaskMetrics level or NamedCacheMetrics? I think
> > the latter makes sense as that holds the cache size at a task level and
> > exposes some other cache related metrics as well like hit-ratio.
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Sat, Feb 12, 2022 at 1:14 PM Sagar  wrote:
> >
> > > Hi All,
> > >
> > > There's another amendment proposed for this KIP. We are adding a new
> > > metric type called *cache-size-bytes-total  *to capture the cache size
> in
> > > bytes accumulated by a task.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Mon, Jan 24, 2022 at 7:55 AM Guozhang Wang 
> > wrote:
> > >
> > >> Thanks Sagar, I'm +1 on the renamed metric.
> > >>
> > >> On Sat, Jan 22, 2022 at 6:56 PM Sagar 
> > wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > There is a small update to the KIP whereby the newly introduced
> metric
> > >> > *total-bytes
> > >> > *has been renamed to *input-buffer-bytes-total.*
> > >> >
> > >> > Thanks!
> > >> > Sagar.
> > >> >
> > >> > On Wed, Sep 29, 2021 at 9:57 AM Sagar 
> > >> wrote:
> > >> >
> > >> > > We have 3 binding votes: Sophie/Guozhang/Mathias
> > >> > > and 2 non-binding votes: Josep/Luke and no -1 votes.
> > >> > >
> > >> > > Marking this KIP as accepted.
> > >> > >
> > >> > > Thanks everyone!
> > >> > >
> > >> > > Thanks!
> > >> > > Sagar.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Sep 29, 2021 at 7:18 AM Matthias J. Sax  >
> > >> > wrote:
> > >> > >
> > >> > >> +1 (binding)
> > >> > >>
> > >> > >> On 9/28/21 10:40 AM, Sagar wrote:
> > >> > >> > Hi All,
> > >> > >> >
> > >> > >> > Bumping this vote thread again!
> > >> > >> >
> > >> > >> > Thanks!
> > >> > >> > Sagar.
> > >> > >> >
> > >> > >> > On Wed, Sep 8, 2021 at 1:19 PM Luke Chen 
> > >> wrote:
> > >> > >> >
> > >> > >> >> Thanks for the KIP.
> > >> > >> >>
> > >> > >> >> + 1 (non-binding)
> > >> > >> >>
> > >> > >> >> Thanks.
> > >> > >> >> Luke
> > >> > >> >>
> > >> > >> >> On Wed, Sep 8, 2021 at 2:48 PM Josep Prat
> > >> >  > >> > >> >
> > >> > >> >> wrote:
> > >> > >> >>
> > >> > >> >>> +1 (non binding).
> > >> > >> >>>
> > >> > >> >>> Thanks for the KIP Sagar!
> > >> > >> >>> ———
> > >> > >> >>> Josep Prat
> > >> > >> >>>
> > >> > >> >>> Aiven Deutschland GmbH
> > >> > >> >>>
> > >> > >> >>> Immanuelkirchstraße 26, 10405 Berlin
> > >> > >> >>>
> > >> > >> >>> Amtsgericht Charlottenburg, HRB 209739 B
> > >> > >> >>>
> > >> > >> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >> > >> >>>
> > >> > >> >>> m: +491715557497
> > >> > >> >>>
> > >> > >> >>> w: aiven.io
> > >> > >> >>>
> > >> > >> >>> e: josep.p...@aiven.io
> > >> > >> >>>
> > >> > >> >>> On Wed, Sep 8, 2021, 03:29 Sophie Blee-Goldman
> > >> > >> >>  > >> > >> 
> > >> > >> >>> wrote:
> > >> > >> >>>
> > >> > >>  +1 (binding)
> > >> > >> 
> > >> > >>  Thanks for the KIP!
> > >> > >> 
> > >> > >>  -Sophie
> > >> > >> 
> > >> > >>  On Tue, Sep 7, 2021 at 1:59 PM Guozhang Wang <
> > >> wangg...@gmail.com>
> > >> > >> >> wrote:
> > >> > >> 
> > >> > >> > Thanks Sagar, +1 from me.
> > >> > >> >
> > >> > >> >
> > >> > >> > Guozhang
> > >> > >> >
> > >> > >> > On Sat, Sep 4, 2021 at 10:29 AM Sagar <
> > >> sagarmeansoc...@gmail.com>
> > >> > >> >>> wrote:
> > >> > >> >
> > >> > >> >> Hi All,
> > >> > >> >>
> > >> > >> >> I would like to start a vote on the following KIP:
> > >> > >> >>
> > >> > >> >>
> > >> > >> >
> > >> > >> 
> > >> > >> >>>
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> > >> > >> >>
> > >> > >> >> Thanks!
> > >> > >> >> Sagar.
> > >> > >> >>
> > >> > >> >
> > >> > >> >
> > >> > >> > --
> > >> > >> > -- Guozhang
> > >> > >> >
> > >> > >> 
> > >> > >> >>>
> > >> > >> >>
> > >> > >> >
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] Should we automatically close stale PRs?

2022-02-13 Thread Ismael Juma
Hi David,

I think it's a good idea to use the bot for auto closing stale PRs. The
ideal flow would be:

1. Write a comment and add stale label
2. If user responds saying that the PR is still valid, the stale label is
removed
3. Otherwise, the PR is closed

Thanks,
Ismael

On Sat, Feb 5, 2022, 2:22 AM David Jacot  wrote:

> Hi team,
>
> I find our ever growing back of PRs a little frustrating, don't
> you? I just made a pass over all the list and a huge chunk
> of the PRs are abandoned, outdated or irrelevant with the
> current code base. For instance, we still have PRs opened
> back in 2015.
>
> There is not a Github Action [1] for automatically marking
> PRs as stale and to automatically close them as well. How
> would the community feel about enabling this? I think that
> we could mark a PR as stable after one year and close it
> a month after if there are no new activities. Reopening a
> closed PR is really easy so there is no real arm is closing
> it.
>
> [1] https://github.com/actions/stale
>