Re: [PR] [Docs] Update site for 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


jlprat commented on code in PR #635:
URL: https://github.com/apache/kafka-site/pull/635#discussion_r1820689299


##
38/js/templateData.js:
##
@@ -19,6 +19,6 @@ limitations under the License.
 var context={
 "version": "38",
 "dotVersion": "3.8",
-"fullDotVersion": "3.8.0",
+"fullDotVersion": "3.8.1-SNAPSHOT",

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [VOTE] 3.9.0 RC2

2024-10-29 Thread Anton Agestam
Hi Colin,

> Why is it bad API design?

Because rather than forcing API designers to specify good defaults, it
forces protocol implementers to inject bad defaults.

The good examples from this thread are host and port, empty string is not a
valid hostname, and zero is not a valid port, so those are very bad default
values. Similarly, for timestamp fields the resulting value will be epoch,
which extremely rarely is a useful default.

Cheers,
Anton

Den mån 28 okt. 2024 kl 17:02 skrev Colin McCabe :

> On Sun, Oct 27, 2024, at 01:44, Anton Agestam wrote:
> > Colin
> >
> > I have presented four reasons, I'll list them again below. Please let me
> > know which ones there isn't already enough information on the thread
> > already.
> >
> > - The behavior is new.
>
> Hi Anton,
>
> This behavior isn't new. I gave an example of tagged fields that have an
> implicit default in 3.8 and earlier.
>
> > - The behavior is undocumented.
>
> It seems like we both agree that the implicit defaults are documented. I
> showed you where in the README.md they are discussed. That section is from
> 2019. Perhaps the disagreement is that you assumed that they didn't apply
> to tagged fields, whereas I assumed that it was obvious that they did.
>
> It looks like Chia-Ping Tsai has opened a JIRA to clarify that implicit
> defaults do indeed apply to tagged fields. I think this will help avoid
> confusion in the future.
>
> > - The behavior is bad API design.
>
> Why is it bad API design?
>
> > - The behavior does not really save bytes *in practice*.
>
> The example you gave shows that the current behavior sends less over the
> wire than your proposed change. Those are not theoretical bytes, they are
> actual bytes.
>
> Saving space on the wire for fields that were not often used was one of
> the explicit goals of the tagged fields KIP, which was KIP-482. As it says
> in the "motivation" section:
>
>  > While [the current] versioning scheme allows us to change the message
> schemas over
>  > time, there are many scenarios that it doesn't support well.  One
> scenario
>  > that isn't well-supported is when we have data that should be sent in
> some
>  > contexts, but not others.  For example, when a MetadataRequest is made
>  > with IncludeClusterAuthorizedOperations set to true, we need to include
>  > the authorized operations in the response.  However, even when
>  > IncludeClusterAuthorizedOperations is set to false, we still must waste
>  > bandwidth sending a set of blank authorized operations fields in the
>  > response.  The problem is that the field that is semantically optional
> in
>  > the message, but that is can't be expressed in the type system for the
>  > Kafka RPC protocol.
>
> You can read it here: https://cwiki.apache.org/confluence/x/OhMyBw
>
> Obviously sending defaults over the wire, in cases where this is not
> needed, goes against that.
>
> >
> > I don't see why *fixing* the release candidate to not break documented
> > behavior should require a KIP, I would actually expect the opposite --
> the
> > new behavior that is being introduced should really have required one.
> >
> >> These two behaviors, taken together, save space on the wire
> >
> > Then you are implicitly arguing that the combination of host="" port=0
> > are common enough that this will practically save bytes on the wire, I
> find
> > that hard to believe.
> >
> > For any future schema that we want to save bytes, there is just as much
> > opportunity to save bytes on the wire with my proposal, they just have to
> > explicitly define default nested values in order to do so.
>
> As I said, there is nothing special about 3.9. This behavior has always
> existed.
>
> If you really want to force everyone to explicitly declare a default for
> each field, then just introduce a KIP to do that. I wouldn't vote for it (I
> still don't see why this is better), but this would at least follow our
> usual process.
>
> One of the problems with forcing an explicit default everywhere is that we
> don't really have a syntax for specifying that the default should be the
> empty collection. For collections, the only choice you get is explicitly
> declaring that the default is null.
>
> best,
> Colin
>
>
> >
> > BR,
> > Anton
> >
> >
> > Den sön 27 okt. 2024 kl 02:56 skrev Colin McCabe :
> >
> >> Hi Anton,
> >>
> >> The behavior where we leave out tagged fields when the default value is
> >> present is intentional. As is the behavior where default values are
> treated
> >> as 0, the empty array, etc. when defaults are not explicitly specified.
> >> These two behaviors, taken together, save space on the wire, and are
> >> simpler for us to implement than what you have proposed. You haven't
> >> presented a single reason why this should change.
> >>
> >> There simply isn't any reason to change the current tagged fields
> >> behavior. And even if we wanted to, it would require a KIP, whereas 3.9
> is
> >> past KIP freeze (along with feature freeze, code freeze, and every 

[jira] [Resolved] (KAFKA-17891) Kafka 3.8 no longer works on z/OS due to hard dependency on zstd-jni

2024-10-29 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-17891.

Resolution: Duplicate

> Kafka 3.8 no longer works on z/OS due to hard dependency on zstd-jni
> 
>
> Key: KAFKA-17891
> URL: https://issues.apache.org/jira/browse/KAFKA-17891
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.8.0
> Environment: z/OS
>Reporter: Dave Crighton
>Priority: Major
>
> Kafka 3.8.0 has introduced a hard dependency on zstd-jni. This library at 
> runtime unpacks a native lib onto the system in a temporary location. Which 
> in itself is an insecure practice. It then attempts to load this library.
>  
> Unfortunately this fails on z/OS because zstd-jni does not include a library 
> for this platform and this therefore means we are unable to update to the 
> Kafka 3.8 java client for IBM Integration Bus on z/OS.
>  
> Since the pre-req is only for optional compression it should be a soft 
> dependency and fail only when the functionality is actually used. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17890) Move DelayedOperationPurgatory to server-common

2024-10-29 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-17890:
--

 Summary: Move DelayedOperationPurgatory to server-common
 Key: KAFKA-17890
 URL: https://issues.apache.org/jira/browse/KAFKA-17890
 Project: Kafka
  Issue Type: Sub-task
Reporter: Mickael Maison
Assignee: Mickael Maison


It's used by RemoteLogManager which will move to the storage module, so we need 
to move DelayedOperationPurgatory to server-common and not server. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-10-29 Thread Divij Vaidya
Let's get the ball rolling (again) on this one.

Kamal, could you please add the following to the KIP:
1. the API as discussed above. Please add the failure modes for this API as
well such as the exceptions thrown and a recommendation on how a caller is
expected to handle those. I am assuming that the three parameters for this
API will be topicPartition, epoch and offset.
2. implementation details for Topic based RLMM. I am assuming that the
plugin will default the field to false if this field is absent (case of old
metadata).
3. In the test plan section, additionally, we need to assert that we don't
read metadata for all segments (i.e. it is not a linear search) from the
Topic based RLMM.
4. in the compatibility section, please document how the existing clusters
with Tiered Storage metadata will work during/after a rolling upgrade to a
version which contains this new change.

--
Divij Vaidya



On Fri, Oct 11, 2024 at 12:26 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Bump for review.
>
> If the additional proposal looks good, I'll append them to the KIP. PTAL.
>
> New API in RLMM#nextRemoteLogSegmentMetadataWithTxnIndex
>
> --
> Kamal
>
> On Sun, Oct 6, 2024 at 7:20 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Christo,
> >
> > Thanks for the review!
> >
> > Adding the new API `nextRemoteLogSegmentMetadataWithTxnIndex` in RLMM
> > helps to
> > reduce the complexity of linear search. With this API, we have to:
> >
> > 1. Maintain one more skip-list [1] for each of the epochs in the
> partition
> > in RLMM that might
> > increase the memory usage of TopicBased RLMM implementation.
> > 1a) The skip-list will be empty when there are no aborted txn entries
> > for a partition/epoch which is the predominant case.
> > 1b) The skip-list will act as a duplicate when *most* of the segments
> > have aborted txn entries, assuming aborted txn are quite low, this should
> > be fine.
> > 2. Change the logic to retrieve the aborted txns (we have to query the
> > nextRLSMWithTxnIndex
> > for each of the leader-epoch).
> > 3. Logic divergence from how we retrieve the aborted txn entries compared
> > to the local-log.
> >
> > The approach looks good to me. If everyone is aligned, then we can
> proceed
> > to add this API to RLMM.
> >
> > Another option I was thinking of is to capture the `lastStableOffsetLag`
> > [2] while rotating the segment.
> > But, that is a bigger change we can take later.
> >
> > [1]:
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogLeaderEpochState.java?L43
> > [2]:
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L432
> >
> >
> > Thanks,
> > Kamal
> >
> > On Fri, Oct 4, 2024 at 4:21 PM Christo Lolov 
> > wrote:
> >
> >> Heya,
> >>
> >> Apologies for the delay. I have been thinking about this problem
> recently
> >> as well and while I believe storing a boolean in the metadata is good, I
> >> think we can do better by introducing a new method to the RLMM along the
> >> lines of
> >>
> >> Optional
> >> nextRemoteLogSegmentMetadataWithTxnIndex(TopicIdPartition
> >> topicIdPartition,
> >> int epochForOffset, long offset) throws RemoteStorageException
> >>
> >> This will help plugin implementers to build optimisations such as skip
> >> lists which will give them the next segment quicker than a linear
> search.
> >>
> >> I am keen to hear your thoughts!
> >>
> >> Best,
> >> Christo
> >>
> >> On Fri, 4 Oct 2024 at 10:48, Kamal Chandraprakash <
> >> kamal.chandraprak...@gmail.com> wrote:
> >>
> >> > Hi Luke,
> >> >
> >> > Thanks for the review!
> >> >
> >> > > Do you think it is helpful if we store the "least abort start offset
> >> in
> >> > the
> >> > segment", and -1 means no txnIndex. So that we can have a way to know
> >> if we
> >> > need to fetch this txn index or not.
> >> >
> >> > 1. No, this change won't have an effect. To find the upper-bound
> offset
> >> > [1], we have to
> >> > fetch that segment's offset index file. The RemoteIndexCache [2]
> >> > fetches all the 3
> >> > index files together and caches them for subsequent use, so this
> >> > improvement
> >> > won't have an effect as the current segment txn index gets
> >> downloaded
> >> > anyway.
> >> >
> >> > 2. The reason for choosing boolean is to make the change backward
> >> > compatible.
> >> >  There can be existing RLM events for the uploaded segments. The
> >> > default
> >> >  value of `txnIdxEmpty` is false so the *old* RLM events are
> >> assumed to
> >> > contain
> >> >  the txn index files and those files are downloaded if they exist.
> >> >
> >> > [1]:
> >> >
> >> >
> >>
> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1732
> >> > [2]:
> >> >
> >> >
> >>
> https://sourcegraph.com/github.com/apache/kafka@trunk

[jira] [Resolved] (KAFKA-17875) Align KRaft controller count recommendations

2024-10-29 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-17875.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Align KRaft controller count recommendations
> 
>
> Key: KAFKA-17875
> URL: https://issues.apache.org/jira/browse/KAFKA-17875
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs, kraft
>Reporter: Mickael Maison
>Assignee: Jhen Yung Hsu
>Priority: Major
> Fix For: 4.0.0
>
>
> In https://kafka.apache.org/documentation/#kraft_deployment we strongly 
> suggest using no more than 3 controllers. This recommendation holds until we 
> complete the implementation of 
> [KIP-996|https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote].
> However just a few [paragraphs 
> above|https://kafka.apache.org/documentation/#kraft_voter] we mention users 
> typically pick between 3 and 5 controllers.
> Until KIP-966 is implemented, we probably only should mention 3 controllers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1043: Administration of groups

2024-10-29 Thread David Jacot
Thanks, Andrew. LGTM.

On Mon, Oct 28, 2024 at 3:58 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi David,
> I've updated the KIP accordingly. GROUP_ID_NOT_FOUND and an
> error message is used to disambiguate the error cases when the group
> ID cannot be found.
>
> Thanks for your time reviewing this KIP.
>
> Andrew
>
> 
> From: David Jacot 
> Sent: 28 October 2024 13:37
> To: dev@kafka.apache.org 
> Subject: Re: [DISCUSS] KIP-1043: Administration of groups
>
> Hi Andrew,
>
> Thanks for your response. I think that your last proposal makes sense to
> me. It seems that we have no choice but to also change the DescribeGroups
> API. However, I still have a preference for the GROUP_ID_NOT_FOUND
> approach. It is simpler in my opinion and consistent across all APIs. From
> an UX perspective, I don't really see the value of
> the INCONSISTENT_GROUP_TYPE error too. In the end, if someone looks up a
> share group called X but X is not a share group, the share group X does not
> exist / is not found. The new error would be useful if users would want to
> handle it to request the correct type but I don't believe that it is a
> relevant use case.
>
> DJ12: Perfect, thanks.
>
> Best,
> David
>
> On Tue, Oct 22, 2024 at 3:46 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
> > Hi David,
> > Thanks for your response.
> >
> > I really don't like an API response whose error message is
> > a significant part of the interface. If we have code that checks
> > for specific error messages and takes different actions, we've
> > got it wrong. If we are just displaying the error text or adding
> > it to an exception, that's fine.
> >
> > I think the tricky part here is that a "consumer group" could
> > actually be a modern consumer group or a classic consumer
> > group.
> >
> > How about this?
> >
> > The DescribeGroups API is used to describe classic groups only.
> > A new version of the response (v6) adds an error message. It
> > would be very weird to populate the error message for an
> > error code of NONE, so v6 makes two changes:
> > a) If the group ID exists but it's not a classic group, the error code
> > INCONSISTENT_GROUP_TYPE is returned along with an error message
> > such as "Group %s is not a classic group".
> > b) If the group ID does not exist, the error code
> > GROUP_ID_NOT_FOUND is returned along with an error message
> > such as "Group %s not found". Formerly, this used to return
> > error code NONE and bogus dead group.
> >
> > Admin#describeConsumerGroups first uses the
> > ConsumerGroupDescribe API:
> > a) If the group ID is a modern consumer group, the error code
> > NONE is returned along with the group description.
> > b) If the group ID exists but it's not a modern consumer group,
> > the error code INCONSISTENT_GROUP_TYPE is returned
> > along with an error message such as "Group %s is not a consumer
> > group".
> > c) If the group ID does not exist, the error code GROUP_ID_NOT_FOUND
> > is returned along with an error message such as
> > "Group %s not found".
> >
> > In case (b), the admin client then uses the DescribeGroups API
> > to see whether the inconsistent group is a classic group or not.
> > The bogus dead group is no longer used in the admin client interface.
> > If the group doesn't exist, it's an exception.
> >
> > Admin#describeShareGroups uses the ShareGroupDescribe API:
> > a) If the group ID is a share group, the error code NONE is returned
> > along with the group description.
> > b) If the group ID exists but it's not a share group, the error code
> > INCONSISTENT_GROUP_TYPE is returned along with an error
> > message such as "Group %s is not a share group".
> > c) If the group ID does not exist, the error code GROUP_ID_NOT_FOUND
> > is returned along with an error message such as "Group %s not found".
> >
> > Admin#describeClassicGroups uses the DescribeGroups v6 API:
> > a) If the group ID is a classic group, the error code NONE is returned
> > along with the group description.
> > b) If the group ID exists but it's not a classic group, the error code
> > INCONSISTENT_GROUP_TYPE is returned along with an error message
> > such as "Group %s is not a classic group".
> > c) If the group ID does not exist, the error code GROUP_ID_NOT_FOUND
> > is returned along with an error message such as "Group %s not found".
> > There is no bogus dead group.
> >
> >
> > Having run through these scenarios, they would work almost the same
> > if GROUP_ID_NOT_FOUND was used in the place of INCONSISTENT_GROUP_TYPE.
> > Personally, I prefer the separate error code and consequently exception
> > because you don't have to look inside the error message to figure out
> > what went wrong. But actually, making it all use GROUP_ID_NOT_FOUND would
> > give the same experience to the operator using the command line tools.
> >
> > If you're still convinced that GROUP_ID_NOT_FOUND is preferable, I'll
> > change the KIP to remove the ne

Kafka Appender for Log4j Core 2.x/3.x

2024-10-29 Thread Piotr P. Karwasz

Hi all,

As some of you are probably aware the Log4j 1.x Kafka Appender was 
removed in KAFKA-17860.


The Apache Log4j project currently maintains a Kafka Appender for Log4j 
Core 2.x[2], but following a poll on `log4j-user@logging`[3] last year 
and a long discussion in the PMC, the appender has been deprecated and 
has been removed from the 3.x branch. The reasons behind this decision 
are mostly:


* There are longstanding bugs opened against the Kafka Appender ([4] and 
[5]) that need to be addressed,


* There are no Apache Log4j devs currently working with Kafka. A 
personal interpretation of this fact is that the features implemented in 
the Kafka Appender are based on the Kafka documentation and not on what 
people actually use.


* There is a mismatch between the lifecycle of the Kafka Appender and 
Log4j Core: during the lifetime of Log4j Core 2.x, several major version 
of Kafka were released. Our appender is basically using only the 
features from Kafka 1.x.


* Last but not least, we are not sure if the Kafka Appender is used at 
all. In 2.x the Kafka Appender is included in `log4j-core`, so it is 
hard to get usage statistics from downloads.


This is why I would like to ask the Kafka community in general and the 
dev team in particular, whether:


* Is there a need for a Kafka Appender at all? Instinctively I would say 
yes, because Apache Flume is dormant, so out of the four appenders with 
guaranteed delivery[6] only Jakarta JMS and JeroMQ will remain.


* Are there some Kafka devs interested in developing one? The old 
appender can be taken as prototype, but I suspect that a lot of work is 
required to make it industrial-grade.


* Where such an appender should be hosted? This is just a minor 
technical problem. If there is a Kafka dev willing to provide long term 
help on Kafka-related issues, we can host it at Apache Logging Services. 
If the Apache Kafka project wants to host the appender, I can provide 
long term support on Log4j Core-related issues.


Piotr

[1] https://issues.apache.org/jira/browse/KAFKA-17860

[2] 
https://logging.apache.org/log4j/2.x/manual/appenders/message-queue.html#KafkaAppender


[3] https://lists.apache.org/thread/76f260b933ol7og757123lyl1ckjvm8y

[4] 
https://issues.apache.org/jira/browse/LOG4J2-1650?jql=project%20%3D%20LOG4J2%20AND%20resolution%20%3D%20Unresolved%20AND%20text%20~%20%22kafka%22%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC


[5] 
https://github.com/apache/logging-log4j2/issues?q=is%3Aissue+is%3Aopen+kafka


[6] https://logging.apache.org/log4j/2.x/manual/appenders/message-queue.html



[jira] [Created] (KAFKA-17891) Kafka 3.8 no longer works on z/OS due to hard dependency on zstd-jni

2024-10-29 Thread Dave Crighton (Jira)
Dave Crighton created KAFKA-17891:
-

 Summary: Kafka 3.8 no longer works on z/OS due to hard dependency 
on zstd-jni
 Key: KAFKA-17891
 URL: https://issues.apache.org/jira/browse/KAFKA-17891
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 3.8.0
 Environment: z/OS
Reporter: Dave Crighton


Kafka 3.8.0 has introduced a hard dependency on zstd-jni. This library at 
runtime unpacks a native lib onto the system in a temporary location. Which in 
itself is an insecure practice. It then attempts to load this library.

 

Unfortunately this fails on z/OS because zstd-jni does not include a library 
for this platform and this therefore means we are unable to update to the Kafka 
3.8 java client for IBM Integration Bus on z/OS.

 

Since the pre-req is only for optional compression it should be a soft 
dependency and fail only when the functionality is actually used. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Configuration format for Log4j Core 2.x

2024-10-29 Thread Piotr P. Karwasz

Hi,

In the context of the current migration process from Log4j 1.x/Reload4j 
to Log4j Core 2.x[1], I believe that the choice of configuration format 
used by the Kafka binary distribution, should receive a particular 
attention.


Log4j Core 2.x supports four native configuration formats (XML, JSON, 
YAML and Java Properties[2]). The version 1.x XML and Java Properties 
configuration file formats are incompatible with the new formats, but 
they can be converted at runtime, using the `log4j-1.2-api` artifact[3]. 
This is of course a transitional option, since the old formats are not 
extensible and do not offer most of the features of Log4j Core 2.x.


While the 2.x Java Properties configuration format might seem as the 
natural migration path for the current Apache Kafka configuration, I 
would strongly advise against this choice. The Log4j Core 2.x runtime 
has a hierarchical structure, which can be easily reflected by formats 
like XML, JSON or YAML, but not so much by Java Properties. For this 
reason the `*.properties` configuration format is:


* very verbose,

* contains a lot of quirks to make it less verbose[4].

If we exclude Java Properties, only three choices remain:

* The default XML format, which has no dependencies (if we exclude the 
JPMS `java.xml` module) and has a schema[5] that can be used to validate 
the configurations. This might, however, strongly contrast with the 
other Kafka configuration files that are maintained as Java Properties.


* The JSON format has a dependency on `jackson-databind`, which is 
already present in the Kafka binary distribution. It is a matter of 
personal taste, but I find it even more verbose than the Java Properties 
format (although it does not have quirks). In Log4j Core 3.x the 
dependency on `jackson-databind` has been replaced with an in-house parser.


* My favorite would be the YAML format, that would require the addition 
of `jackson-dataformat-yaml` (and its `snakeyaml` transitive dependency) 
to the Kafka runtime. The advantage, however, would be that it is 
probably the less verbose of the available formats.


What do you think, which one of the configuration formats available in 
Log4j Core 2.x should be used by default by Kafka?


Piotr

[1] https://github.com/apache/kafka/pull/17373

[2] 
https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-factories


[3] 
https://logging.apache.org/log4j/2.x/migrate-from-log4j1.html#ConfigurationCompatibility


[4] 
https://logging.apache.org/log4j/2.x/manual/configuration.html#java-properties-features


[5] https://logging.apache.org/xml/ns/



Re: [PR] [Docs] Update site for 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


jlprat merged PR #635:
URL: https://github.com/apache/kafka-site/pull/635


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-15961) Flaky test: testTopicIdPersistsThroughControllerRestart() – kafka.controller.ControllerIntegrationTest

2024-10-29 Thread Apoorv Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apoorv Mittal resolved KAFKA-15961.
---
Resolution: Won't Fix

Marking ZK failed test as won't fix.

> Flaky test:  testTopicIdPersistsThroughControllerRestart() – 
> kafka.controller.ControllerIntegrationTest
> ---
>
> Key: KAFKA-15961
> URL: https://issues.apache.org/jira/browse/KAFKA-15961
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apoorv Mittal
>Priority: Major
>
> PR build: 
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14767/15/tests/]
>  
> {code:java}
> java.util.NoSuchElementException: 
> None.getStacktracejava.util.NoSuchElementException: None.getat 
> scala.None$.get(Option.scala:627)at scala.None$.get(Option.scala:626)
> at 
> kafka.controller.ControllerIntegrationTest.testTopicIdPersistsThroughControllerRestart(ControllerIntegrationTest.scala:1684)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:568)   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
>at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
> at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>  at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>   at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.9.0 RC5

2024-10-29 Thread Justine Olshan
Hey Colin,

Thanks for the new RC. I've tested the --feature functionality that was
missing before. Seems to be working as expected now.
As a side note, I was looking into how to configure kip-853 and it took me
a moment to find the instructions in the documentation, I wonder if it
could be included in the notable changes section of the upgrade notes. Not
a blocker as I believe we can still update documentation.

I did some other spot checks on the rest of the release. +1 (binding) from
me

Justine

On Tue, Oct 29, 2024 at 12:45 PM Colin McCabe  wrote:

> Thanks, Anton. And thanks Chia-Ping Tsai for taking a look at how we can
> improve the docs here...
>
> best,
> Colin
>
>
> On Tue, Oct 29, 2024, at 02:39, Anton Agestam wrote:
> > Hi Chia-Ping,
> >
> > Thanks for pointing those two fields out. I retract my -1.
> >
> > Cheers,
> > Anton
> >
> > Den sön 27 okt. 2024 kl 17:40 skrev Chia-Ping Tsai  >:
> >
> >> hi Anton
> >>
> >> Thanks for sharing your insights on Kafka serialization—it’s really cool
> >> and interesting to me. Additionally, you inspired me to file a JIRA
> issue
> >> (KAFKA-17882) to improve the documentation.
> >>
> >> The most important aspect of Kafka is compatibility, and the
> undocumented
> >> behavior has been in place for some time [0][1]. This means there’s no
> need
> >> to rush your improvement for 3.9, as we’ll need to explicitly add
> default
> >> values after applying your patch to ensure we generate the same binary
> data.
> >>
> >> In short, we can improve the documentation first. In the meantime, we
> can
> >> continue discussing behavior clarification for 4.0, and RM can keep
> running
> >> the RC for 3.9. Everything is on track.
> >>
> >> Best,
> >> Chia-Ping
> >>
> >> [0]
> >>
> https://github.com/apache/kafka/blob/3.8/clients/src/main/resources/common/message/FetchSnapshotResponse.json#L43
> >> [1]
> >>
> https://github.com/apache/kafka/blob/3.8/group-coordinator/src/main/resources/common/message/ConsumerGroupMemberMetadataValue.json#L39
> >>
> >> On 2024/10/27 15:28:05 Anton Agestam wrote:
> >> > -1, refer to comments on the RC 2 thread.
> >> >
> >> > Den sön 27 okt. 2024 kl 02:51 skrev Colin McCabe  >:
> >> >
> >> > > This is the RC5 candidate for the release of Apache Kafka 3.9.0.
> >> > >
> >> > > - This is a major release, the final one in the 3.x line. (There
> may of
> >> > > course be other minor releases in this line, such as 3.9.1.)
> >> > > - Tiered storage will be considered production-ready in this
> release.
> >> > > - This will be the final major release to feature the deprecated
> >> ZooKeeper
> >> > > mode.
> >> > >
> >> > > This release includes the following KIPs:
> >> > > - KIP-853: Support dynamically changing KRaft controller membership
> >> > > - KIP-1057: Add remote log metadata flag to the dump log tool
> >> > > - KIP-1049: Add config log.summary.interval.ms to Kafka Streams
> >> > > - KIP-1040: Improve handling of nullable values in InsertField,
> >> > > ExtractField, and other transformations
> >> > > - KIP-1031: Control offset translation in MirrorSourceConnector
> >> > > - KIP-1033: Add Kafka Streams exception handler for exceptions
> >> occurring
> >> > > during processing
> >> > > - KIP-1017: Health check endpoint for Kafka Connect
> >> > > - KIP-1025: Optionally URL-encode clientID and clientSecret in
> >> > > authorization header
> >> > > - KIP-1005: Expose EarliestLocalOffset and TieredOffset
> >> > > - KIP-950: Tiered Storage Disablement
> >> > > - KIP-956: Tiered Storage Quotas
> >> > >
> >> > > Release notes for the 3.9.0 release:
> >> > >
> >>
> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/RELEASE_NOTES.html
> >> > >
> >> > > *** Please download, test and vote by October 30, 2024.
> >> > >
> >> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> >> > > https://kafka.apache.org/KEYS
> >> > >
> >> > > * Release artifacts to be voted upon (source and binary):
> >> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/
> >> > >
> >> > > * Docker release artifacts to be voted upon:
> >> > > apache/kafka:3.9.0-rc5
> >> > > apache/kafka-native:3.9.0-rc5
> >> > >
> >> > > * Maven artifacts to be voted upon:
> >> > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >> > >
> >> > > * Javadoc:
> >> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/javadoc/
> >> > >
> >> > > * Documentation:
> >> > > https://kafka.apache.org/39/documentation.html
> >> > >
> >> > > * Protocol:
> >> > > https://kafka.apache.org/39/protocol.html
> >> > >
> >> > > * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc5 tag:
> >> > > https://github.com/apache/kafka/releases/tag/3.9.0-rc5
> >> > >
> >> > > * Successful Docker Image Github Actions Pipeline for 3.9 branch:
> >> > > Docker Build Test Pipeline (JVM):
> >> > > https://github.com/apache/kafka/actions/runs/11535300463
> >> > > Docker Build Test Pipeline (Native):
> >> > > https://github.com/apache/kafka/actions/runs/11535328957
> >> > >

Re: [VOTE] 3.9.0 RC5

2024-10-29 Thread Colin McCabe
Thanks, Anton. And thanks Chia-Ping Tsai for taking a look at how we can 
improve the docs here...

best,
Colin


On Tue, Oct 29, 2024, at 02:39, Anton Agestam wrote:
> Hi Chia-Ping,
>
> Thanks for pointing those two fields out. I retract my -1.
>
> Cheers,
> Anton
>
> Den sön 27 okt. 2024 kl 17:40 skrev Chia-Ping Tsai :
>
>> hi Anton
>>
>> Thanks for sharing your insights on Kafka serialization—it’s really cool
>> and interesting to me. Additionally, you inspired me to file a JIRA issue
>> (KAFKA-17882) to improve the documentation.
>>
>> The most important aspect of Kafka is compatibility, and the undocumented
>> behavior has been in place for some time [0][1]. This means there’s no need
>> to rush your improvement for 3.9, as we’ll need to explicitly add default
>> values after applying your patch to ensure we generate the same binary data.
>>
>> In short, we can improve the documentation first. In the meantime, we can
>> continue discussing behavior clarification for 4.0, and RM can keep running
>> the RC for 3.9. Everything is on track.
>>
>> Best,
>> Chia-Ping
>>
>> [0]
>> https://github.com/apache/kafka/blob/3.8/clients/src/main/resources/common/message/FetchSnapshotResponse.json#L43
>> [1]
>> https://github.com/apache/kafka/blob/3.8/group-coordinator/src/main/resources/common/message/ConsumerGroupMemberMetadataValue.json#L39
>>
>> On 2024/10/27 15:28:05 Anton Agestam wrote:
>> > -1, refer to comments on the RC 2 thread.
>> >
>> > Den sön 27 okt. 2024 kl 02:51 skrev Colin McCabe :
>> >
>> > > This is the RC5 candidate for the release of Apache Kafka 3.9.0.
>> > >
>> > > - This is a major release, the final one in the 3.x line. (There may of
>> > > course be other minor releases in this line, such as 3.9.1.)
>> > > - Tiered storage will be considered production-ready in this release.
>> > > - This will be the final major release to feature the deprecated
>> ZooKeeper
>> > > mode.
>> > >
>> > > This release includes the following KIPs:
>> > > - KIP-853: Support dynamically changing KRaft controller membership
>> > > - KIP-1057: Add remote log metadata flag to the dump log tool
>> > > - KIP-1049: Add config log.summary.interval.ms to Kafka Streams
>> > > - KIP-1040: Improve handling of nullable values in InsertField,
>> > > ExtractField, and other transformations
>> > > - KIP-1031: Control offset translation in MirrorSourceConnector
>> > > - KIP-1033: Add Kafka Streams exception handler for exceptions
>> occurring
>> > > during processing
>> > > - KIP-1017: Health check endpoint for Kafka Connect
>> > > - KIP-1025: Optionally URL-encode clientID and clientSecret in
>> > > authorization header
>> > > - KIP-1005: Expose EarliestLocalOffset and TieredOffset
>> > > - KIP-950: Tiered Storage Disablement
>> > > - KIP-956: Tiered Storage Quotas
>> > >
>> > > Release notes for the 3.9.0 release:
>> > >
>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/RELEASE_NOTES.html
>> > >
>> > > *** Please download, test and vote by October 30, 2024.
>> > >
>> > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > https://kafka.apache.org/KEYS
>> > >
>> > > * Release artifacts to be voted upon (source and binary):
>> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/
>> > >
>> > > * Docker release artifacts to be voted upon:
>> > > apache/kafka:3.9.0-rc5
>> > > apache/kafka-native:3.9.0-rc5
>> > >
>> > > * Maven artifacts to be voted upon:
>> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> > >
>> > > * Javadoc:
>> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/javadoc/
>> > >
>> > > * Documentation:
>> > > https://kafka.apache.org/39/documentation.html
>> > >
>> > > * Protocol:
>> > > https://kafka.apache.org/39/protocol.html
>> > >
>> > > * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc5 tag:
>> > > https://github.com/apache/kafka/releases/tag/3.9.0-rc5
>> > >
>> > > * Successful Docker Image Github Actions Pipeline for 3.9 branch:
>> > > Docker Build Test Pipeline (JVM):
>> > > https://github.com/apache/kafka/actions/runs/11535300463
>> > > Docker Build Test Pipeline (Native):
>> > > https://github.com/apache/kafka/actions/runs/11535328957
>> > >
>> > > Thanks to everyone who helped with this release candidate, either by
>> > > contributing code, testing, or documentation.
>> > >
>> > > Regards,
>> > > Colin
>> > >
>> >
>>


[jira] [Created] (KAFKA-17898) Seperate Epoch Bump Scenarios and Error Handling in TV2

2024-10-29 Thread Ritika Reddy (Jira)
Ritika Reddy created KAFKA-17898:


 Summary: Seperate Epoch Bump Scenarios and Error Handling in TV2
 Key: KAFKA-17898
 URL: https://issues.apache.org/jira/browse/KAFKA-17898
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ritika Reddy
Assignee: Ritika Reddy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Configuration format for Log4j Core 2.x

2024-10-29 Thread David Arthur
Thanks for starting this discussion, Piotr!

I think moving towards a more modern format for our logging config would be
great. Personally, I think YAML would be the nicest to work with as an
operator. It should also be very familiar to those who work in Docker and
Kubernetes.

A few thoughts

1. This would establish two different config formats in Kafka. Properties
for kafka configs and YAML/XML/JSON for log configs. Whatever we choose for
the log4j2 config format, we should also consider it as a possible format
for Kafka itself (assuming we ever move towards modernizing our own
configs).

2. How do we determine which type of config file has been given? Do we try
to infer it based on file extension? What is the behavior if both old and
new files exist?

3. Since a bit of time has passed since we voted on KIP-653, we may need to
amend it to lay out a deprecation path for the log4j 1.x properties format

4. Data bindings and parsers are common sources of CVEs. It looks like
Snakeyaml is no exception (
https://www.cvedetails.com/version-list/0/66013/1/), though it doesn't look
much worse than Jackson. Just to point out, this will add a bit of
dependency overhead as we keep up with security patches.


-David A


On Tue, Oct 29, 2024 at 8:48 AM Piotr P. Karwasz 
wrote:

> Hi,
>
> In the context of the current migration process from Log4j 1.x/Reload4j
> to Log4j Core 2.x[1], I believe that the choice of configuration format
> used by the Kafka binary distribution, should receive a particular
> attention.
>
> Log4j Core 2.x supports four native configuration formats (XML, JSON,
> YAML and Java Properties[2]). The version 1.x XML and Java Properties
> configuration file formats are incompatible with the new formats, but
> they can be converted at runtime, using the `log4j-1.2-api` artifact[3].
> This is of course a transitional option, since the old formats are not
> extensible and do not offer most of the features of Log4j Core 2.x.
>
> While the 2.x Java Properties configuration format might seem as the
> natural migration path for the current Apache Kafka configuration, I
> would strongly advise against this choice. The Log4j Core 2.x runtime
> has a hierarchical structure, which can be easily reflected by formats
> like XML, JSON or YAML, but not so much by Java Properties. For this
> reason the `*.properties` configuration format is:
>
> * very verbose,
>
> * contains a lot of quirks to make it less verbose[4].
>
> If we exclude Java Properties, only three choices remain:
>
> * The default XML format, which has no dependencies (if we exclude the
> JPMS `java.xml` module) and has a schema[5] that can be used to validate
> the configurations. This might, however, strongly contrast with the
> other Kafka configuration files that are maintained as Java Properties.
>
> * The JSON format has a dependency on `jackson-databind`, which is
> already present in the Kafka binary distribution. It is a matter of
> personal taste, but I find it even more verbose than the Java Properties
> format (although it does not have quirks). In Log4j Core 3.x the
> dependency on `jackson-databind` has been replaced with an in-house parser.
>
> * My favorite would be the YAML format, that would require the addition
> of `jackson-dataformat-yaml` (and its `snakeyaml` transitive dependency)
> to the Kafka runtime. The advantage, however, would be that it is
> probably the less verbose of the available formats.
>
> What do you think, which one of the configuration formats available in
> Log4j Core 2.x should be used by default by Kafka?
>
> Piotr
>
> [1] https://github.com/apache/kafka/pull/17373
>
> [2]
>
> https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-factories
>
> [3]
>
> https://logging.apache.org/log4j/2.x/migrate-from-log4j1.html#ConfigurationCompatibility
>
> [4]
>
> https://logging.apache.org/log4j/2.x/manual/configuration.html#java-properties-features
>
> [5] https://logging.apache.org/xml/ns/
>
>

-- 
David Arthur


[jira] [Resolved] (KAFKA-17128) Make node.id immutable after removing zookeeper migration

2024-10-29 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17128.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Make node.id immutable after removing zookeeper migration
> -
>
> Key: KAFKA-17128
> URL: https://issues.apache.org/jira/browse/KAFKA-17128
> Project: Kafka
>  Issue Type: Improvement
>Reporter: TengYao Chi
>Assignee: TengYao Chi
>Priority: Minor
> Fix For: 4.0.0
>
>
> Making `nodeId` mutable is a workaround to fix the issue of 
> de-synchronization between the generated `brokerId` and `nodeId`. It should 
> be back to immutable after removing zookeeper.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17893) Support record keys in the foreignKeyExtractor argument of KTable foreign join

2024-10-29 Thread Piotr Jaszkowski (Jira)
Piotr Jaszkowski created KAFKA-17893:


 Summary: Support record keys in the foreignKeyExtractor argument 
of KTable foreign join
 Key: KAFKA-17893
 URL: https://issues.apache.org/jira/browse/KAFKA-17893
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Piotr Jaszkowski


Only accepting record values in the foreignKeyExtractor  is an unnecessary 
limitation that forces weird topologies.

See discussion in 
https://forum.confluent.io/t/why-foreignkeyextractor-does-not-accept-key/12427/2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Apache Kafka 3.8.1

2024-10-29 Thread Apoorv Mittal
Thanks Josep for all the hard work for the release.

Regards,
Apoorv Mittal



On Tue, Oct 29, 2024 at 2:51 PM Mickael Maison 
wrote:

> Thanks Josep for running this release!
>
> Mickael
>
> On Tue, Oct 29, 2024 at 3:23 PM Josep Prat  wrote:
> >
> > The Apache Kafka community is pleased to announce the release for
> > Apache Kafka 3.8.1
> >
> > This is a bug fix release and it includes fixes and improvements.
> >
> > All of the changes in this release can be found in the release notes:
> > https://www.apache.org/dist/kafka/3.8.1/RELEASE_NOTES.html
> >
> >
> > An overview of the release can be found in our announcement blog post:
> > https://kafka.apache.org/blog#apache_kafka_381_release_announcement
> >
> > You can download the source and binary release (Scala ) from:
> > https://kafka.apache.org/downloads#3.8.1
> >
> >
> ---
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> > ** The Producer API allows an application to publish a stream of records
> to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output stream to one or more output topics, effectively transforming the
> > input streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might
> > capture every change to a table.
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react
> > to the streams of data.
> >
> >
> > Apache Kafka is in use at large and small companies worldwide, including
> > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> >
> > A big thank you for the following 24 contributors to this release!
> > (Please report an unintended omission)
> >
> > Andrew Schofield, Apoorv Mittal, Bill Bejeck, Bruno Cadonna, Chia-Ping
> > Tsai, Chris Egerton, Colin P. McCabe, David Arthur, Guillaume Mallet,
> > Igor Soarez, Josep Prat, Justine Olshan, Ken Huang, Kondrat Bertalan,
> > Kuan-Po Tseng, Luke Chen, Manikumar Reddy, Matthias J. Sax, Mickael
> > Maison, PoAn Yang, Rohan, TengYao Chi, Vikas Singh
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > https://kafka.apache.org/
> >
> > Thank you!
> >
> >
> > Regards,
> >
> > Josep Prat
> > Release Manager for Apache Kafka 3.8.1
>


Re: Configuration format for Log4j Core 2.x

2024-10-29 Thread Piotr P. Karwasz

Hi David,

On 29.10.2024 16:30, David Arthur wrote:

2. How do we determine which type of config file has been given? Do we try
to infer it based on file extension? What is the behavior if both old and
new files exist?


I believe that the current PR[1] shows the correct behavior. Upgraded 
systems will have a residual `config/log4j.properties` file. If such a 
file exists, it should be used, but users should be warned about its 
deprecation. If the `config/log4j.properties` file does not exist and a 
`config/log4j2.yaml` file exist, the new file should be used.


[1] https://github.com/apache/kafka/pull/17373


4. Data bindings and parsers are common sources of CVEs. It looks like
Snakeyaml is no exception (
https://www.cvedetails.com/version-list/0/66013/1/), though it doesn't look
much worse than Jackson. Just to point out, this will add a bit of
dependency overhead as we keep up with security patches.


That is an excellent question and a perfect motivation to introduce 
Vulnerability Exploitability eXchange between at least the ASF projects. 
I would love to do that, but right now there are no tools to automate 
that and the cost for maintainers would be astronomical. Hopefully in 
some mid term future you could expect Logging Services to publish a VEX 
file. We already do it with VDR (which contains only our own 
vulnerabilities[2]), but that one can be maintained by hand.


SnakeYaml would be a deeply nested transitive dependency, but I believe 
we can trust the Jackson team to do the right thing:


* for two CVEs ([3] and [4]) the Jackson team decided to release a new 
version of `jackson-dataformat-yaml`.


* for one CVE that only concerned data binding, the Jackson team didn't 
have to do anything[5].


On the Logging Services side, we are not so thorough with 
vulnerabilities passed on by Jackson: we just bump the version of 
Jackson. It is probably worth noting that:


* The binding of the configuration file to Log4j components is always 
done in-house. From Jackson and other dependencies, we only use the tree 
parser.


* A side-effect of having a custom plugin system[6] is that only classes 
annotated with `@Plugin` can be instantiated.


* The XML configuration format is based on a JAXP DOM parser implementation.

* The Java Properties format is probably the most subject to security 
issues, since it uses an in-house tree parser[7] (obviously based on the 
Properties class). We will replace that bunch of code with 
`jackson-dataformat-properties` in 3.x[8], which should improve security 
by leaving the construction of the tree to parsing experts. The format 
will change slightly, which is also a reason I don't recommend Java 
Properties.


Piotr

[2] https://logging.apache.org/cyclonedx/vdr.xml

[3] https://github.com/FasterXML/jackson-dataformats-text/pull/328

[4] https://github.com/FasterXML/jackson-dataformats-text/issues/342

[5] https://github.com/FasterXML/jackson-dataformats-text/issues/382

[6] https://logging.apache.org/log4j/2.x/manual/plugins.html

[7] 
https://github.com/apache/logging-log4j2/blob/2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/properties/PropertiesConfigurationBuilder.java


[8] 
https://logging.staged.apache.org/log4j/3.x/migrate-from-log4j2.html#properties-configuration-file





[jira] [Created] (KAFKA-17895) Expose KeyValueStore's approximateNumEntries as a metric

2024-10-29 Thread Jira
Clément MATHIEU created KAFKA-17895:
---

 Summary: Expose KeyValueStore's approximateNumEntries as a metric
 Key: KAFKA-17895
 URL: https://issues.apache.org/jira/browse/KAFKA-17895
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 3.8.1
Reporter: Clément MATHIEU


Tracking the evolution of a state store's size is often useful.
 
For example, we often use state store to persist pending work and set alert on 
maximum size because it means the process is falling behind. 

While KafkaStreams exposes many generic state store related or RocksDB 
specificif metrics; is does not expose KeyValueStore#approximateNumEntries 
which is a key information.

Is it an oversight or was it a deliberate choice? 

I would be great if this metrics could be added. I assume that both in-memory & 
rocksdb implementation of approximateNumEntries are fast enought to be used in 
metrics. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: Move 3.8.0 to archived releases [kafka-site]

2024-10-29 Thread via GitHub


jlprat merged PR #638:
URL: https://github.com/apache/kafka-site/pull/638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[ANNOUNCE] Apache Kafka 3.8.1

2024-10-29 Thread Josep Prat
The Apache Kafka community is pleased to announce the release for
Apache Kafka 3.8.1

This is a bug fix release and it includes fixes and improvements.

All of the changes in this release can be found in the release notes:
https://www.apache.org/dist/kafka/3.8.1/RELEASE_NOTES.html


An overview of the release can be found in our announcement blog post:
https://kafka.apache.org/blog#apache_kafka_381_release_announcement

You can download the source and binary release (Scala ) from:
https://kafka.apache.org/downloads#3.8.1

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream of records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming the
input streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react
to the streams of data.


Apache Kafka is in use at large and small companies worldwide, including
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
Target, The New York Times, Uber, Yelp, and Zalando, among others.

A big thank you for the following 24 contributors to this release!
(Please report an unintended omission)

Andrew Schofield, Apoorv Mittal, Bill Bejeck, Bruno Cadonna, Chia-Ping
Tsai, Chris Egerton, Colin P. McCabe, David Arthur, Guillaume Mallet,
Igor Soarez, Josep Prat, Justine Olshan, Ken Huang, Kondrat Bertalan,
Kuan-Po Tseng, Luke Chen, Manikumar Reddy, Matthias J. Sax, Mickael
Maison, PoAn Yang, Rohan, TengYao Chi, Vikas Singh

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kafka.apache.org/

Thank you!


Regards,

Josep Prat
Release Manager for Apache Kafka 3.8.1


[jira] [Created] (KAFKA-17896) Create Admin.describeClassicGroups

2024-10-29 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-17896:


 Summary: Create Admin.describeClassicGroups
 Key: KAFKA-17896
 URL: https://issues.apache.org/jira/browse/KAFKA-17896
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrew Schofield
Assignee: Andrew Schofield






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16248) Kafka consumer should cache leader offset ranges

2024-10-29 Thread Alieh Saeedi (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alieh Saeedi resolved KAFKA-16248.
--
Resolution: Fixed

> Kafka consumer should cache leader offset ranges
> 
>
> Key: KAFKA-16248
> URL: https://issues.apache.org/jira/browse/KAFKA-16248
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Lucas Brutschy
>Assignee: Alieh Saeedi
>Priority: Critical
>
> We noticed a streams application received an OFFSET_OUT_OF_RANGE error 
> following a network partition and streams task rebalance and subsequently 
> reset its offsets to the beginning.
> Inspecting the logs, we saw multiple consumer log messages like: 
> {code:java}
> Setting offset for partition tp to the committed offset 
> FetchPosition{offset=1234, offsetEpoch=Optional.empty...)
> {code}
> Inspecting the streams code, it looks like kafka streams calls `commitSync` 
> passing through an explicit OffsetAndMetadata object but does not populate 
> the offset leader epoch.
> The offset leader epoch is required in the offset commit to ensure that all 
> consumers in the consumer group have coherent metadata before fetching. 
> Otherwise after a consumer group rebalance, a consumer may fetch with a stale 
> leader epoch with respect to the committed offset and get an offset out of 
> range error from a zombie partition leader.
> An example of where this can cause issues:
> 1. We have a consumer group with consumer 1 and consumer 2. Partition P is 
> assigned to consumer 1 which has up-to-date metadata for P. Consumer 2 has 
> stale metadata for P.
> 2. Consumer 1 fetches partition P with offset 50, epoch 8. commits the offset 
> 50 without an epoch.
> 3. The consumer group rebalances and P is now assigned to consumer 2. 
> Consumer 2 has a stale leader epoch for P (let's say leader epoch 7). 
> Consumer 2 will now try to fetch with leader epoch 7, offset 50. If we have a 
> zombie leader due to a network partition, the zombie leader may accept 
> consumer 2's fetch leader epoch and return an OFFSET_OUT_OF_RANGE to consumer 
> 2.
> If in step 1, consumer 1 committed the leader epoch for the message, then 
> when consumer 2 receives assignment P it would force a metadata refresh to 
> discover a sufficiently new leader epoch for the committed offset.
> Kafka Streams cannot fully determine the leader epoch of the offsets it wants 
> to commit - in EOS mode, streams commits the offset after the last control 
> records (to avoid always having a lag of >0), but the leader epoch of the 
> control record is not known to streams (since only non-control records are 
> returned from Consumer.poll).
> A fix discussed with [~hachikuji] is to have the consumer cache leader epoch 
> ranges, similar to how the broker maintains a leader epoch cache.
> This ticket was split from the original ticket 
> https://issues.apache.org/jira/browse/KAFKA-15344 which was described as a 
> streams fix, but the problem cannot be fully fixed in streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Apache Kafka 3.8.1

2024-10-29 Thread 黃竣陽
Hello Josep

Thanks for hard working on running this release!

Best regards,
Jiunn-Yang

> TengYao Chi  於 2024年10月30日 凌晨12:35 寫道:
> 
> Hi Josep
> Thanks for running this release
> 
> Cheers,
> TengYao
> 
> Apoorv Mittal  於 2024年10月30日 週三 上午12:18寫道:
> 
>> Thanks Josep for all the hard work for the release.
>> 
>> Regards,
>> Apoorv Mittal
>> 
>> 
>> 
>> On Tue, Oct 29, 2024 at 2:51 PM Mickael Maison 
>> wrote:
>> 
>>> Thanks Josep for running this release!
>>> 
>>> Mickael
>>> 
>>> On Tue, Oct 29, 2024 at 3:23 PM Josep Prat  wrote:
 
 The Apache Kafka community is pleased to announce the release for
 Apache Kafka 3.8.1
 
 This is a bug fix release and it includes fixes and improvements.
 
 All of the changes in this release can be found in the release notes:
 https://www.apache.org/dist/kafka/3.8.1/RELEASE_NOTES.html
 
 
 An overview of the release can be found in our announcement blog post:
 https://kafka.apache.org/blog#apache_kafka_381_release_announcement
 
 You can download the source and binary release (Scala ) from:
 https://kafka.apache.org/downloads#3.8.1
 
 
>>> 
>> ---
 
 
 Apache Kafka is a distributed streaming platform with four core APIs:
 
 
 ** The Producer API allows an application to publish a stream of
>> records
>>> to
 one or more Kafka topics.
 
 ** The Consumer API allows an application to subscribe to one or more
 topics and process the stream of records produced to them.
 
 ** The Streams API allows an application to act as a stream processor,
 consuming an input stream from one or more topics and producing an
 output stream to one or more output topics, effectively transforming
>> the
 input streams to output streams.
 
 ** The Connector API allows building and running reusable producers or
 consumers that connect Kafka topics to existing applications or data
 systems. For example, a connector to a relational database might
 capture every change to a table.
 
 
 With these APIs, Kafka can be used for two broad classes of
>> application:
 
 ** Building real-time streaming data pipelines that reliably get data
 between systems or applications.
 
 ** Building real-time streaming applications that transform or react
 to the streams of data.
 
 
 Apache Kafka is in use at large and small companies worldwide,
>> including
 Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
>> Rabobank,
 Target, The New York Times, Uber, Yelp, and Zalando, among others.
 
 A big thank you for the following 24 contributors to this release!
 (Please report an unintended omission)
 
 Andrew Schofield, Apoorv Mittal, Bill Bejeck, Bruno Cadonna, Chia-Ping
 Tsai, Chris Egerton, Colin P. McCabe, David Arthur, Guillaume Mallet,
 Igor Soarez, Josep Prat, Justine Olshan, Ken Huang, Kondrat Bertalan,
 Kuan-Po Tseng, Luke Chen, Manikumar Reddy, Matthias J. Sax, Mickael
 Maison, PoAn Yang, Rohan, TengYao Chi, Vikas Singh
 
 We welcome your help and feedback. For more information on how to
 report problems, and to get involved, visit the project website at
 https://kafka.apache.org/
 
 Thank you!
 
 
 Regards,
 
 Josep Prat
 Release Manager for Apache Kafka 3.8.1
>>> 
>> 



[jira] [Created] (KAFKA-17897) Deprecate Admin.listConsumerGroups

2024-10-29 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-17897:


 Summary: Deprecate Admin.listConsumerGroups
 Key: KAFKA-17897
 URL: https://issues.apache.org/jira/browse/KAFKA-17897
 Project: Kafka
  Issue Type: Sub-task
Reporter: Andrew Schofield
Assignee: Andrew Schofield






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] MINOR: Move 3.8.0 to archived releases [kafka-site]

2024-10-29 Thread via GitHub


jlprat opened a new pull request, #638:
URL: https://github.com/apache/kafka-site/pull/638

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Move 3.8.0 to archived releases [kafka-site]

2024-10-29 Thread via GitHub


jlprat commented on PR #638:
URL: https://github.com/apache/kafka-site/pull/638#issuecomment-2444510344

   ping @mimaison 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [ANNOUNCE] Apache Kafka 3.8.1

2024-10-29 Thread TengYao Chi
Hi Josep
Thanks for running this release

Cheers,
TengYao

Apoorv Mittal  於 2024年10月30日 週三 上午12:18寫道:

> Thanks Josep for all the hard work for the release.
>
> Regards,
> Apoorv Mittal
>
>
>
> On Tue, Oct 29, 2024 at 2:51 PM Mickael Maison 
> wrote:
>
> > Thanks Josep for running this release!
> >
> > Mickael
> >
> > On Tue, Oct 29, 2024 at 3:23 PM Josep Prat  wrote:
> > >
> > > The Apache Kafka community is pleased to announce the release for
> > > Apache Kafka 3.8.1
> > >
> > > This is a bug fix release and it includes fixes and improvements.
> > >
> > > All of the changes in this release can be found in the release notes:
> > > https://www.apache.org/dist/kafka/3.8.1/RELEASE_NOTES.html
> > >
> > >
> > > An overview of the release can be found in our announcement blog post:
> > > https://kafka.apache.org/blog#apache_kafka_381_release_announcement
> > >
> > > You can download the source and binary release (Scala ) from:
> > > https://kafka.apache.org/downloads#3.8.1
> > >
> > >
> >
> ---
> > >
> > >
> > > Apache Kafka is a distributed streaming platform with four core APIs:
> > >
> > >
> > > ** The Producer API allows an application to publish a stream of
> records
> > to
> > > one or more Kafka topics.
> > >
> > > ** The Consumer API allows an application to subscribe to one or more
> > > topics and process the stream of records produced to them.
> > >
> > > ** The Streams API allows an application to act as a stream processor,
> > > consuming an input stream from one or more topics and producing an
> > > output stream to one or more output topics, effectively transforming
> the
> > > input streams to output streams.
> > >
> > > ** The Connector API allows building and running reusable producers or
> > > consumers that connect Kafka topics to existing applications or data
> > > systems. For example, a connector to a relational database might
> > > capture every change to a table.
> > >
> > >
> > > With these APIs, Kafka can be used for two broad classes of
> application:
> > >
> > > ** Building real-time streaming data pipelines that reliably get data
> > > between systems or applications.
> > >
> > > ** Building real-time streaming applications that transform or react
> > > to the streams of data.
> > >
> > >
> > > Apache Kafka is in use at large and small companies worldwide,
> including
> > > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
> Rabobank,
> > > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> > >
> > > A big thank you for the following 24 contributors to this release!
> > > (Please report an unintended omission)
> > >
> > > Andrew Schofield, Apoorv Mittal, Bill Bejeck, Bruno Cadonna, Chia-Ping
> > > Tsai, Chris Egerton, Colin P. McCabe, David Arthur, Guillaume Mallet,
> > > Igor Soarez, Josep Prat, Justine Olshan, Ken Huang, Kondrat Bertalan,
> > > Kuan-Po Tseng, Luke Chen, Manikumar Reddy, Matthias J. Sax, Mickael
> > > Maison, PoAn Yang, Rohan, TengYao Chi, Vikas Singh
> > >
> > > We welcome your help and feedback. For more information on how to
> > > report problems, and to get involved, visit the project website at
> > > https://kafka.apache.org/
> > >
> > > Thank you!
> > >
> > >
> > > Regards,
> > >
> > > Josep Prat
> > > Release Manager for Apache Kafka 3.8.1
> >
>


[jira] [Created] (KAFKA-17894) Additional metrics for cooperative consumption

2024-10-29 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-17894:
-

 Summary: Additional metrics for cooperative consumption
 Key: KAFKA-17894
 URL: https://issues.apache.org/jira/browse/KAFKA-17894
 Project: Kafka
  Issue Type: New Feature
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] MINOR: Move 3.8.0 to archived releases [kafka-site]

2024-10-29 Thread via GitHub


mimaison commented on code in PR #638:
URL: https://github.com/apache/kafka-site/pull/638#discussion_r1821100757


##
downloads.html:
##
@@ -145,6 +110,41 @@ 3.6.2Archived releases
 
+
+3.8.0
+
+
+Released July 29, 2024
+
+
+https://archive.apache.org/dist/kafka/3.8.0/RELEASE_NOTES.html";>Release 
Notes
+
+
+Docker image: https://hub.docker.com/layers/apache/kafka/3.8.0/images/sha256-c9aea96a4813e77e703541b1d8f7d58c9ee05b77353da33684db55c840548791";>apache/kafka:3.8.0.
+
+
+Docker Native image: https://hub.docker.com/layers/apache/kafka-native/3.8.0/images/sha256-e1b3af1f501bb1d0c2dc11ce4fb04d0132568c9da18232bdd25643b587599ded";>apache/kafka-native:3.8.0.
+
+
+Source download: https://archive.apache.org/dist/kafka/3.8.0/kafka-3.8.0-src.tgz";>kafka-3.8.0-src.tgz
 (https://archive.apache.org/dist/kafka/3.8.0/kafka-3.8.0-src.tgz.asc";>asc,
 https://archive.apache.org/dist/kafka/3.7.0/kafka-3.8.0-src.tgz.sha512";>sha512)

Review Comment:
   Typo in 
https://archive.apache.org/dist/kafka/3.7.0/kafka-3.8.0-src.tgz.sha512, it 
should be `3.8.0` instead of `3.7.0`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-17892) Update README after migration to log4j2

2024-10-29 Thread TengYao Chi (Jira)
TengYao Chi created KAFKA-17892:
---

 Summary: Update README after migration to log4j2
 Key: KAFKA-17892
 URL: https://issues.apache.org/jira/browse/KAFKA-17892
 Project: Kafka
  Issue Type: Improvement
Reporter: TengYao Chi
Assignee: TengYao Chi


see discussion: 
https://github.com/apache/kafka/pull/17373#discussion_r1813405127

Since the current example of log4j in the README relies on a file in the trunk 
branch, we should update the README after migrating to log4j2.

https://github.com/apache/kafka/blob/984777f0b952b6e1629d3b16ce1f196d54278c3e/README.md?plain=1#L57



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [DOCS] Update Javadoc 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


jlprat merged PR #636:
URL: https://github.com/apache/kafka-site/pull/636


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[DISCUSS] KIP-1103: Additional metrics for cooperative consumption

2024-10-29 Thread Apoorv Mittal
Hi Everyone,
I would like to start a discussion on KIP-1103:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption

This KIP extends KIP-932 to provide additional metrics for
Queues/Cooperative consumption.

Regards,
Apoorv Mittal


Re: [PR] [Docs] Update site for 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


jlprat commented on PR #635:
URL: https://github.com/apache/kafka-site/pull/635#issuecomment-2444520341

   Here there is the PR: https://github.com/apache/kafka-site/pull/638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Docs] Update site for 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


jlprat commented on PR #635:
URL: https://github.com/apache/kafka-site/pull/635#issuecomment-295088

   I guess so. As 3.8.1 superseeded 3.8.0. I'll push a PR changing it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [ANNOUNCE] Apache Kafka 3.8.1

2024-10-29 Thread Mickael Maison
Thanks Josep for running this release!

Mickael

On Tue, Oct 29, 2024 at 3:23 PM Josep Prat  wrote:
>
> The Apache Kafka community is pleased to announce the release for
> Apache Kafka 3.8.1
>
> This is a bug fix release and it includes fixes and improvements.
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/3.8.1/RELEASE_NOTES.html
>
>
> An overview of the release can be found in our announcement blog post:
> https://kafka.apache.org/blog#apache_kafka_381_release_announcement
>
> You can download the source and binary release (Scala ) from:
> https://kafka.apache.org/downloads#3.8.1
>
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream of records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 24 contributors to this release!
> (Please report an unintended omission)
>
> Andrew Schofield, Apoorv Mittal, Bill Bejeck, Bruno Cadonna, Chia-Ping
> Tsai, Chris Egerton, Colin P. McCabe, David Arthur, Guillaume Mallet,
> Igor Soarez, Josep Prat, Justine Olshan, Ken Huang, Kondrat Bertalan,
> Kuan-Po Tseng, Luke Chen, Manikumar Reddy, Matthias J. Sax, Mickael
> Maison, PoAn Yang, Rohan, TengYao Chi, Vikas Singh
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
>
> Regards,
>
> Josep Prat
> Release Manager for Apache Kafka 3.8.1


Re: [PR] MINOR: Move 3.8.0 to archived releases [kafka-site]

2024-10-29 Thread via GitHub


jlprat commented on code in PR #638:
URL: https://github.com/apache/kafka-site/pull/638#discussion_r1821110671


##
downloads.html:
##
@@ -145,6 +110,41 @@ 3.6.2Archived releases
 
+
+3.8.0
+
+
+Released July 29, 2024
+
+
+https://archive.apache.org/dist/kafka/3.8.0/RELEASE_NOTES.html";>Release 
Notes
+
+
+Docker image: https://hub.docker.com/layers/apache/kafka/3.8.0/images/sha256-c9aea96a4813e77e703541b1d8f7d58c9ee05b77353da33684db55c840548791";>apache/kafka:3.8.0.
+
+
+Docker Native image: https://hub.docker.com/layers/apache/kafka-native/3.8.0/images/sha256-e1b3af1f501bb1d0c2dc11ce4fb04d0132568c9da18232bdd25643b587599ded";>apache/kafka-native:3.8.0.
+
+
+Source download: https://archive.apache.org/dist/kafka/3.8.0/kafka-3.8.0-src.tgz";>kafka-3.8.0-src.tgz
 (https://archive.apache.org/dist/kafka/3.8.0/kafka-3.8.0-src.tgz.asc";>asc,
 https://archive.apache.org/dist/kafka/3.7.0/kafka-3.8.0-src.tgz.sha512";>sha512)

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-10-29 Thread Divij Vaidya
A few more points to discuss (please add to the KIP as well)

5. How are we determining the value of the TrxIndexEmpty field on segment
rotation?

One option is to do a boolean txnIdxEmpty =
segment.txnIndex().allAbortedTxns().isEmpty() but this will have an
overhead of reading the contents of the file and storing them in memory,
when we have a non-empty index.
The other option (preferred) is to add an isEmpty() public method to the
TransactionIndex and perform a segment.txnIndex().isEmpty() check which
will internally use Files.size() java API.

On Tue, Oct 29, 2024 at 1:21 PM Divij Vaidya 
wrote:

> Let's get the ball rolling (again) on this one.
>
> Kamal, could you please add the following to the KIP:
> 1. the API as discussed above. Please add the failure modes for this API
> as well such as the exceptions thrown and a recommendation on how a caller
> is expected to handle those. I am assuming that the three parameters for
> this API will be topicPartition, epoch and offset.
> 2. implementation details for Topic based RLMM. I am assuming that the
> plugin will default the field to false if this field is absent (case of old
> metadata).
> 3. In the test plan section, additionally, we need to assert that we don't
> read metadata for all segments (i.e. it is not a linear search) from the
> Topic based RLMM.
> 4. in the compatibility section, please document how the existing clusters
> with Tiered Storage metadata will work during/after a rolling upgrade to a
> version which contains this new change.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Oct 11, 2024 at 12:26 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
>> Bump for review.
>>
>> If the additional proposal looks good, I'll append them to the KIP. PTAL.
>>
>> New API in RLMM#nextRemoteLogSegmentMetadataWithTxnIndex
>>
>> --
>> Kamal
>>
>> On Sun, Oct 6, 2024 at 7:20 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > Hi Christo,
>> >
>> > Thanks for the review!
>> >
>> > Adding the new API `nextRemoteLogSegmentMetadataWithTxnIndex` in RLMM
>> > helps to
>> > reduce the complexity of linear search. With this API, we have to:
>> >
>> > 1. Maintain one more skip-list [1] for each of the epochs in the
>> partition
>> > in RLMM that might
>> > increase the memory usage of TopicBased RLMM implementation.
>> > 1a) The skip-list will be empty when there are no aborted txn
>> entries
>> > for a partition/epoch which is the predominant case.
>> > 1b) The skip-list will act as a duplicate when *most* of the
>> segments
>> > have aborted txn entries, assuming aborted txn are quite low, this
>> should
>> > be fine.
>> > 2. Change the logic to retrieve the aborted txns (we have to query the
>> > nextRLSMWithTxnIndex
>> > for each of the leader-epoch).
>> > 3. Logic divergence from how we retrieve the aborted txn entries
>> compared
>> > to the local-log.
>> >
>> > The approach looks good to me. If everyone is aligned, then we can
>> proceed
>> > to add this API to RLMM.
>> >
>> > Another option I was thinking of is to capture the `lastStableOffsetLag`
>> > [2] while rotating the segment.
>> > But, that is a bigger change we can take later.
>> >
>> > [1]:
>> >
>> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogLeaderEpochState.java?L43
>> > [2]:
>> >
>> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L432
>> >
>> >
>> > Thanks,
>> > Kamal
>> >
>> > On Fri, Oct 4, 2024 at 4:21 PM Christo Lolov 
>> > wrote:
>> >
>> >> Heya,
>> >>
>> >> Apologies for the delay. I have been thinking about this problem
>> recently
>> >> as well and while I believe storing a boolean in the metadata is good,
>> I
>> >> think we can do better by introducing a new method to the RLMM along
>> the
>> >> lines of
>> >>
>> >> Optional
>> >> nextRemoteLogSegmentMetadataWithTxnIndex(TopicIdPartition
>> >> topicIdPartition,
>> >> int epochForOffset, long offset) throws RemoteStorageException
>> >>
>> >> This will help plugin implementers to build optimisations such as skip
>> >> lists which will give them the next segment quicker than a linear
>> search.
>> >>
>> >> I am keen to hear your thoughts!
>> >>
>> >> Best,
>> >> Christo
>> >>
>> >> On Fri, 4 Oct 2024 at 10:48, Kamal Chandraprakash <
>> >> kamal.chandraprak...@gmail.com> wrote:
>> >>
>> >> > Hi Luke,
>> >> >
>> >> > Thanks for the review!
>> >> >
>> >> > > Do you think it is helpful if we store the "least abort start
>> offset
>> >> in
>> >> > the
>> >> > segment", and -1 means no txnIndex. So that we can have a way to know
>> >> if we
>> >> > need to fetch this txn index or not.
>> >> >
>> >> > 1. No, this change won't have an effect. To find the upper-bound
>> offset
>> >> > [1], we have to
>> >> > fetch that segment's offset index file. The RemoteIndexCache [2]
>> >> > fetches all the 3
>> >> > index files together and caches them

Re: [PR] [Docs] Update site for 3.8.1 [kafka-site]

2024-10-29 Thread via GitHub


mimaison commented on PR #635:
URL: https://github.com/apache/kafka-site/pull/635#issuecomment-267996

   Should we move 3.8.0 out of the supported releases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-1091: Improved Kafka Streams operator metrics

2024-10-29 Thread Apoorv Mittal
Hi Bill,
Thanks for the KIP. I have a request, can we please define MBean and
telemetry names for the new metrics so it's easier to see where they will
be added (group, tags etc.)

Regards,
Apoorv Mittal


On Tue, Oct 29, 2024 at 5:15 AM Sophie Blee-Goldman 
wrote:

> Hey Bill,
>
> Thanks for the KIP! That all makes sense to me, just one minor note: while
> you mentioned the TRACE recording level in the Motivation section, it seems
> to be missing from the table in the Public Interfaces section. I assume
> this will also be included, presumably with a value of 2?
>
> Cheers,
> Sophie
>
> On Fri, Oct 25, 2024 at 12:20 PM Bill Bejeck  wrote:
>
> > Hi All,
> >
> > I would like to start a discussion thread on KIP-1091:Improved Kafka
> > Streams operator metrics
> >
> > Here's a link to the KIP:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1091%3A+Improved+Kafka+Streams+operator+metrics
> >
> > I look forward to the discussion.
> >
> > Thanks,
> > Bill
> >
>


Re: [VOTE] 3.9.0 RC2

2024-10-29 Thread Colin McCabe
On Tue, Oct 29, 2024, at 02:38, Anton Agestam wrote:

>
> Because rather than forcing API designers to specify good defaults,
> [implicit defaults] force protocol implementers to inject bad defaults.
>
> The good examples from this thread are host and port, empty string is not a
> valid hostname, and zero is not a valid port, so those are very bad default
> values. Similarly, for timestamp fields the resulting value will be epoch,
> which extremely rarely is a useful default.
>

In all the cases I can remember, we actually wanted an invalid port and invalid 
hostname to be the default, so that we could clearly differentiate the case 
where a port and hostname were sent back from the case where they were not. So 
in this particular case the implicit defaults are actually exactly what we want.

If we wanted to get rid of implicit defaults and make everything explicit, we 
would have to introduce a richer syntax for initializing some types, like 
UUIDs, collections, etc. Collections alone would be daunting since the problem 
is recursive (I can have a collection containing other collections, etc.)

best,
Colin


>
> Cheers,
> Anton
>
> Den mån 28 okt. 2024 kl 17:02 skrev Colin McCabe :
>
>> On Sun, Oct 27, 2024, at 01:44, Anton Agestam wrote:
>> > Colin
>> >
>> > I have presented four reasons, I'll list them again below. Please let me
>> > know which ones there isn't already enough information on the thread
>> > already.
>> >
>> > - The behavior is new.
>>
>> Hi Anton,
>>
>> This behavior isn't new. I gave an example of tagged fields that have an
>> implicit default in 3.8 and earlier.
>>
>> > - The behavior is undocumented.
>>
>> It seems like we both agree that the implicit defaults are documented. I
>> showed you where in the README.md they are discussed. That section is from
>> 2019. Perhaps the disagreement is that you assumed that they didn't apply
>> to tagged fields, whereas I assumed that it was obvious that they did.
>>
>> It looks like Chia-Ping Tsai has opened a JIRA to clarify that implicit
>> defaults do indeed apply to tagged fields. I think this will help avoid
>> confusion in the future.
>>
>> > - The behavior is bad API design.
>>
>> Why is it bad API design?
>>
>> > - The behavior does not really save bytes *in practice*.
>>
>> The example you gave shows that the current behavior sends less over the
>> wire than your proposed change. Those are not theoretical bytes, they are
>> actual bytes.
>>
>> Saving space on the wire for fields that were not often used was one of
>> the explicit goals of the tagged fields KIP, which was KIP-482. As it says
>> in the "motivation" section:
>>
>>  > While [the current] versioning scheme allows us to change the message
>> schemas over
>>  > time, there are many scenarios that it doesn't support well.  One
>> scenario
>>  > that isn't well-supported is when we have data that should be sent in
>> some
>>  > contexts, but not others.  For example, when a MetadataRequest is made
>>  > with IncludeClusterAuthorizedOperations set to true, we need to include
>>  > the authorized operations in the response.  However, even when
>>  > IncludeClusterAuthorizedOperations is set to false, we still must waste
>>  > bandwidth sending a set of blank authorized operations fields in the
>>  > response.  The problem is that the field that is semantically optional
>> in
>>  > the message, but that is can't be expressed in the type system for the
>>  > Kafka RPC protocol.
>>
>> You can read it here: https://cwiki.apache.org/confluence/x/OhMyBw
>>
>> Obviously sending defaults over the wire, in cases where this is not
>> needed, goes against that.
>>
>> >
>> > I don't see why *fixing* the release candidate to not break documented
>> > behavior should require a KIP, I would actually expect the opposite --
>> the
>> > new behavior that is being introduced should really have required one.
>> >
>> >> These two behaviors, taken together, save space on the wire
>> >
>> > Then you are implicitly arguing that the combination of host="" port=0
>> > are common enough that this will practically save bytes on the wire, I
>> find
>> > that hard to believe.
>> >
>> > For any future schema that we want to save bytes, there is just as much
>> > opportunity to save bytes on the wire with my proposal, they just have to
>> > explicitly define default nested values in order to do so.
>>
>> As I said, there is nothing special about 3.9. This behavior has always
>> existed.
>>
>> If you really want to force everyone to explicitly declare a default for
>> each field, then just introduce a KIP to do that. I wouldn't vote for it (I
>> still don't see why this is better), but this would at least follow our
>> usual process.
>>
>> One of the problems with forcing an explicit default everywhere is that we
>> don't really have a syntax for specifying that the default should be the
>> empty collection. For collections, the only choice you get is explicitly
>> declaring that the default is null.
>>
>> best,

Re: [DISCUSS] Require KIPs to include "How to teach this section"

2024-10-29 Thread Colin McCabe
Hi Anton,

Perhaps there should be a "documentation" section in the KIP template? That 
might help raise awareness of the need to document these changes.

I want to add, in the case of the wire protocol, the KIPs themselves are part 
of the documentation. But they shouldn't be all of the documentation. We should 
update protocol.html and possibly other docs in cases where we're changing the 
protocol. I don't think it necessary needs to be done before the change itself, 
but it should be done so that the release that includes the protocol changes 
also includes their docs...

best,
Colin


On Sat, Oct 26, 2024, at 10:53, Anton Agestam wrote:
> Hello Kafka devs 👋
>
> Colin encouraged me in the 3.9.0 RC2 thread to contribute ideas around how
> the protocol documentation can be improved. While I have concrete ideas on
> this, the current biggest issue as I see it is that new changes are not
> making it into documentation, and there seems to be a bit of a general lack
> of process with regards to this issue.
>
> KIP-893 was a very poignant example of this. It introduces a new concept in
> the protocol's byte serialization format, none of which made it into
> documentation. This was extremely subtle and very time consuming to debug
> for me as an author of a third-party protocol implementation in Python
> that must remain compatible with Apache Kafka. Based on the view of the
> existing documentation, this flat out looked like a bug.
>
> The Python ecosystem solves this issue by requiring PEPs to have a "How to
> teach this section", forcing documentation to not be an afterthought. I am
> proposing to introduce the exact same concept for KIPs. I believe this will
> be useful for all KIPs, not just those of the sort mentioned above.
>
> For changes to the protocol, I will also suggest that it should be required
> for specification to be updated _before_ implementation changes are merged,
> but this should perhaps be a separate discussion.
>
> Forcing us to include a plan for documentation in all future KIPs is also a
> solid strategy to incrementally improve documentation over time. Kafka's
> docs are lacking also in other regards not related to the protocol. The
> proposed change will make sure we have a net positive development over
> time, towards a greater state of Kafka docs.
>
> Without first ensuring that there is a mechanism like this that makes sure
> documentation does not rot over time, it doesn't seem like the best
> investment of time to improve documentation of the current state. For the
> continued success of Kafka this is highly important because it is what
> enables a thriving ecosystem.
>
> - Have there been something similar discussed previously?
> - What do you all think of this proposal?
>
> BR,
> Anton
>
> -- 
> [image: Aiven] 
> *Anton Agestam* (he/him or they/them)
> Software Engineer, *Aiven*
> anton.ages...@aiven.io   |   +46 704 486 289
> aiven.io    |
> 
> 


Re: [VOTE] 3.8.1 RC1

2024-10-29 Thread Josep Prat
Hi all,
Thanks to all the reviewers!

This vote passes with 7 +1 votes (3 bindings) and no 0 or -1 votes.

+1 votes
PMC Members:
* Mickael Maison
* Luke Chen
* Chia-Ping Tsai

Community:
* TengYao Chi
* Jiunn-Yang
* Federico Valeri
* Jakub Scholz

0 votes
* No votes

-1 votes
* No votes

I'll continue with the release process and the release announcement will
follow in the next few days.

Best,

--
Josep Prat
Open Source Engineering Director, Aiven
josep.p...@aiven.io   |   +491715557497 | aiven.io
Aiven Deutschland GmbH
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
Anna Richardson, Kenneth Chen
Amtsgericht Charlottenburg, HRB 209739 B

On Tue, Oct 29, 2024, 01:12 Jakub Scholz  wrote:

> +1 (non-binding) ... I used the staged Scala 2.13 binaries and Maven
> artifacts and run my tests. All seems to work fine. Thanks for the release.
>
> Jakub
>
> On Thu, Oct 17, 2024 at 10:27 PM Josep Prat 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second release candidate of Apache Kafka 3.8.1.
> >
> > This is a bugfix release with several fixes.
> >
> > Release notes for the 3.8.1 release:
> >
> https://dist.apache.org/repos/dist/dev/kafka/3.8.1-rc1/RELEASE_NOTES.html
> >
> >  Please download, test and vote by Tuesday, October 22, 9am ET*
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://dist.apache.org/repos/dist/dev/kafka/3.8.1-rc1/
> >
> > * Docker release artifacts to be voted upon:
> > apache/kafka:3.8.1-rc1
> > apache/kafka-native:3.8.1-rc1
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://dist.apache.org/repos/dist/dev/kafka/3.8.1-rc1/javadoc/
> >
> > * Tag to be voted upon (off 3.8 branch) is the 3.8.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.8.1-rc1
> >
> > * Documentation:
> > Mind that the home.apache.org server is retired now.
> > https://kafka.apache.org/38/documentation.html
> > And https://github.com/apache/kafka-site/pull/635
> >
> > * Protocol:
> > https://kafka.apache.org/38/protocol.html
> > And https://github.com/apache/kafka-site/pull/635
> >
> > * Jenkins builds for the 3.8 branch:
> > Unit/integration tests: There are some flaky tests, with the combination
> of
> > these 4 builds all tests passed at least once:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.8/103/testReport/
> > (latest build)
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.8/101/testReport/
> ,
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.8/102/testReport/
> > and
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.8/97/testReport/
> > All tests pass locally
> >
> > System tests: Between these 2 runs all tests were successful:
> >
> >
> https://confluent-open-source-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.8/2024-10-07--001.af519a09-fdc8-4d46-8478-e0280854e43e--1728373295--confluentinc--3.8--7dbc44143a/report.html
> >
> >
> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-10-01--001.e7b0a1be-bac1-4792-96da-ec94116e20ce--1727846843--confluentinc--3.8--99746d683a/report.html
> >
> > * Successful Docker Image Github Actions Pipeline for 3.8 branch:
> > Docker Build Test Pipeline (JVM):
> > https://github.com/apache/kafka/actions/runs/11390962530
> > Docker Build Test Pipeline (Native):
> > https://github.com/apache/kafka/actions/runs/11391548205
> >
> > /**
> >
> > Thanks,
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |   <
> https://www.facebook.com/aivencloud
> > >
> >      <
> > https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > Anna Richardson, Kenneth Chen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>


Re: [VOTE] KIP-1043: Administration of groups

2024-10-29 Thread David Jacot
Thanks for the KIP and the good discussion. +1 (binding)

Best,
DJ

On Wed, Oct 2, 2024 at 3:16 AM David Arthur  wrote:

> +1 binding
>
> Cheers,
> David A
>
> On Mon, Sep 23, 2024 at 6:43 AM Apoorv Mittal 
> wrote:
>
> > Hi Andrew,
> > Thanks for the KIP, this will be very helpful.
> >
> > +1 (non-binding)
> >
> > Regards,
> > Apoorv Mittal
> >
> >
> > On Mon, Sep 23, 2024 at 11:38 AM Lianet M.  wrote:
> >
> > > Hello Andrew,
> > >
> > > Thanks for the KIP.
> > >
> > > +1 (binding)
> > >
> > > Lianet
> > >
> > > On Mon, Sep 23, 2024, 3:37 a.m. Lucas Brutschy
> > >  wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > thanks for the KIP!
> > > >
> > > > +1 (binding)
> > > >
> > > > Cheers,
> > > > Lucas
> > > >
> > > > On Mon, Sep 23, 2024 at 9:27 AM Andrew Schofield
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > I would like to start voting for KIP-1043: Administration of
> groups.
> > > > This KIP enhances the command-line tools to make it easier to
> > administer
> > > > groups on clusters with a variety of types of groups.
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups
> > > > >
> > > > > Thanks.
> > > > > Andrew
> > > >
> > >
> >
>
>
> --
> David Arthur
>


Re: [VOTE] 3.9.0 RC5

2024-10-29 Thread Anton Agestam
Hi Chia-Ping,

Thanks for pointing those two fields out. I retract my -1.

Cheers,
Anton

Den sön 27 okt. 2024 kl 17:40 skrev Chia-Ping Tsai :

> hi Anton
>
> Thanks for sharing your insights on Kafka serialization—it’s really cool
> and interesting to me. Additionally, you inspired me to file a JIRA issue
> (KAFKA-17882) to improve the documentation.
>
> The most important aspect of Kafka is compatibility, and the undocumented
> behavior has been in place for some time [0][1]. This means there’s no need
> to rush your improvement for 3.9, as we’ll need to explicitly add default
> values after applying your patch to ensure we generate the same binary data.
>
> In short, we can improve the documentation first. In the meantime, we can
> continue discussing behavior clarification for 4.0, and RM can keep running
> the RC for 3.9. Everything is on track.
>
> Best,
> Chia-Ping
>
> [0]
> https://github.com/apache/kafka/blob/3.8/clients/src/main/resources/common/message/FetchSnapshotResponse.json#L43
> [1]
> https://github.com/apache/kafka/blob/3.8/group-coordinator/src/main/resources/common/message/ConsumerGroupMemberMetadataValue.json#L39
>
> On 2024/10/27 15:28:05 Anton Agestam wrote:
> > -1, refer to comments on the RC 2 thread.
> >
> > Den sön 27 okt. 2024 kl 02:51 skrev Colin McCabe :
> >
> > > This is the RC5 candidate for the release of Apache Kafka 3.9.0.
> > >
> > > - This is a major release, the final one in the 3.x line. (There may of
> > > course be other minor releases in this line, such as 3.9.1.)
> > > - Tiered storage will be considered production-ready in this release.
> > > - This will be the final major release to feature the deprecated
> ZooKeeper
> > > mode.
> > >
> > > This release includes the following KIPs:
> > > - KIP-853: Support dynamically changing KRaft controller membership
> > > - KIP-1057: Add remote log metadata flag to the dump log tool
> > > - KIP-1049: Add config log.summary.interval.ms to Kafka Streams
> > > - KIP-1040: Improve handling of nullable values in InsertField,
> > > ExtractField, and other transformations
> > > - KIP-1031: Control offset translation in MirrorSourceConnector
> > > - KIP-1033: Add Kafka Streams exception handler for exceptions
> occurring
> > > during processing
> > > - KIP-1017: Health check endpoint for Kafka Connect
> > > - KIP-1025: Optionally URL-encode clientID and clientSecret in
> > > authorization header
> > > - KIP-1005: Expose EarliestLocalOffset and TieredOffset
> > > - KIP-950: Tiered Storage Disablement
> > > - KIP-956: Tiered Storage Quotas
> > >
> > > Release notes for the 3.9.0 release:
> > >
> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by October 30, 2024.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/
> > >
> > > * Docker release artifacts to be voted upon:
> > > apache/kafka:3.9.0-rc5
> > > apache/kafka-native:3.9.0-rc5
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/javadoc/
> > >
> > > * Documentation:
> > > https://kafka.apache.org/39/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/39/protocol.html
> > >
> > > * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc5 tag:
> > > https://github.com/apache/kafka/releases/tag/3.9.0-rc5
> > >
> > > * Successful Docker Image Github Actions Pipeline for 3.9 branch:
> > > Docker Build Test Pipeline (JVM):
> > > https://github.com/apache/kafka/actions/runs/11535300463
> > > Docker Build Test Pipeline (Native):
> > > https://github.com/apache/kafka/actions/runs/11535328957
> > >
> > > Thanks to everyone who helped with this release candidate, either by
> > > contributing code, testing, or documentation.
> > >
> > > Regards,
> > > Colin
> > >
> >
>


Re: [VOTE] 3.9.0 RC2

2024-10-29 Thread Anton Agestam
Hi,

Chia-Ping shared in the RC5 thread now two cases where the behavior under
discussion actually exists prior to 3.9 so all my arguments here are moot.
I missed those two cases because they have the same name as the struct
introduced in 3.9, and my regex excluded that by name when I was looking
for existing cases.

> It seems like we both agree that the implicit defaults are documented. I
showed you where in the README.md they are discussed.

Technically, it was *I* who showed *you* that 😅 I feel like you have
been ignoring most of the details I have been writing in this thread.
Including now with the reasoning about saved bytes ... I again feel
paraphrased.

>  Perhaps the disagreement is that you assumed that they didn't apply to
tagged fields, whereas I assumed that it was obvious that they did.

It's more narrow than this, and I guess the fact that this is not
understood yet is why we have been going in circles. It was already clear
that tagged struct fields are considered to have a default value when all
of the struts fields have an explicit default value.

I opened a thread about improving the KIP process such that it accounts for
documentation improvements over time, having that in place would be a
motivation for me to contribute further to the documentation of the wire
protocol.

Cheers,
Anton


Den mån 28 okt. 2024 kl 17:02 skrev Colin McCabe :

> On Sun, Oct 27, 2024, at 01:44, Anton Agestam wrote:
> > Colin
> >
> > I have presented four reasons, I'll list them again below. Please let me
> > know which ones there isn't already enough information on the thread
> > already.
> >
> > - The behavior is new.
>
> Hi Anton,
>
> This behavior isn't new. I gave an example of tagged fields that have an
> implicit default in 3.8 and earlier.
>
> > - The behavior is undocumented.
>
> It seems like we both agree that the implicit defaults are documented. I
> showed you where in the README.md they are discussed. That section is from
> 2019. Perhaps the disagreement is that you assumed that they didn't apply
> to tagged fields, whereas I assumed that it was obvious that they did.
>
> It looks like Chia-Ping Tsai has opened a JIRA to clarify that implicit
> defaults do indeed apply to tagged fields. I think this will help avoid
> confusion in the future.
>
> > - The behavior is bad API design.
>
> Why is it bad API design?
>
> > - The behavior does not really save bytes *in practice*.
>
> The example you gave shows that the current behavior sends less over the
> wire than your proposed change. Those are not theoretical bytes, they are
> actual bytes.
>
> Saving space on the wire for fields that were not often used was one of
> the explicit goals of the tagged fields KIP, which was KIP-482. As it says
> in the "motivation" section:
>
>  > While [the current] versioning scheme allows us to change the message
> schemas over
>  > time, there are many scenarios that it doesn't support well.  One
> scenario
>  > that isn't well-supported is when we have data that should be sent in
> some
>  > contexts, but not others.  For example, when a MetadataRequest is made
>  > with IncludeClusterAuthorizedOperations set to true, we need to include
>  > the authorized operations in the response.  However, even when
>  > IncludeClusterAuthorizedOperations is set to false, we still must waste
>  > bandwidth sending a set of blank authorized operations fields in the
>  > response.  The problem is that the field that is semantically optional
> in
>  > the message, but that is can't be expressed in the type system for the
>  > Kafka RPC protocol.
>
> You can read it here: https://cwiki.apache.org/confluence/x/OhMyBw
>
> Obviously sending defaults over the wire, in cases where this is not
> needed, goes against that.
>
> >
> > I don't see why *fixing* the release candidate to not break documented
> > behavior should require a KIP, I would actually expect the opposite --
> the
> > new behavior that is being introduced should really have required one.
> >
> >> These two behaviors, taken together, save space on the wire
> >
> > Then you are implicitly arguing that the combination of host="" port=0
> > are common enough that this will practically save bytes on the wire, I
> find
> > that hard to believe.
> >
> > For any future schema that we want to save bytes, there is just as much
> > opportunity to save bytes on the wire with my proposal, they just have to
> > explicitly define default nested values in order to do so.
>
> As I said, there is nothing special about 3.9. This behavior has always
> existed.
>
> If you really want to force everyone to explicitly declare a default for
> each field, then just introduce a KIP to do that. I wouldn't vote for it (I
> still don't see why this is better), but this would at least follow our
> usual process.
>
> One of the problems with forcing an explicit default everywhere is that we
> don't really have a syntax for specifying that the default should be the
> empty collection. For collections, th

Re: [VOTE] 3.9.0 RC5

2024-10-29 Thread Luke Chen
Hi Colin,

I was trying to test the RC, but I found the artifacts in Maven artifacts
to be voted upon are not up-to-date.
Not only the "Last Modified" time in the 3.9.0 artifacts are on Oct. 10,
but also the source code in the artifacts didn't include the latest commits
in RC5 here .

Could you help verify it?

But I confirmed the release artifacts in "
https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/"; is up-to-date.
I did:
- checked the checksums and signatures
- ran upgrade from v3.8.1 to v3.9.0
- ran quickstart with the 2.13 binaries
- verify tiered storage and KIP-950 (dynamically disablement)

I'll be +1 after the maven artifacts are updated.

Thank you for running the release.
Luke


On Wed, Oct 30, 2024 at 5:40 AM Justine Olshan 
wrote:

> Hey Colin,
>
> Thanks for the new RC. I've tested the --feature functionality that was
> missing before. Seems to be working as expected now.
> As a side note, I was looking into how to configure kip-853 and it took me
> a moment to find the instructions in the documentation, I wonder if it
> could be included in the notable changes section of the upgrade notes. Not
> a blocker as I believe we can still update documentation.
>
> I did some other spot checks on the rest of the release. +1 (binding) from
> me
>
> Justine
>
> On Tue, Oct 29, 2024 at 12:45 PM Colin McCabe  wrote:
>
> > Thanks, Anton. And thanks Chia-Ping Tsai for taking a look at how we can
> > improve the docs here...
> >
> > best,
> > Colin
> >
> >
> > On Tue, Oct 29, 2024, at 02:39, Anton Agestam wrote:
> > > Hi Chia-Ping,
> > >
> > > Thanks for pointing those two fields out. I retract my -1.
> > >
> > > Cheers,
> > > Anton
> > >
> > > Den sön 27 okt. 2024 kl 17:40 skrev Chia-Ping Tsai <
> chia7...@apache.org
> > >:
> > >
> > >> hi Anton
> > >>
> > >> Thanks for sharing your insights on Kafka serialization—it’s really
> cool
> > >> and interesting to me. Additionally, you inspired me to file a JIRA
> > issue
> > >> (KAFKA-17882) to improve the documentation.
> > >>
> > >> The most important aspect of Kafka is compatibility, and the
> > undocumented
> > >> behavior has been in place for some time [0][1]. This means there’s no
> > need
> > >> to rush your improvement for 3.9, as we’ll need to explicitly add
> > default
> > >> values after applying your patch to ensure we generate the same binary
> > data.
> > >>
> > >> In short, we can improve the documentation first. In the meantime, we
> > can
> > >> continue discussing behavior clarification for 4.0, and RM can keep
> > running
> > >> the RC for 3.9. Everything is on track.
> > >>
> > >> Best,
> > >> Chia-Ping
> > >>
> > >> [0]
> > >>
> >
> https://github.com/apache/kafka/blob/3.8/clients/src/main/resources/common/message/FetchSnapshotResponse.json#L43
> > >> [1]
> > >>
> >
> https://github.com/apache/kafka/blob/3.8/group-coordinator/src/main/resources/common/message/ConsumerGroupMemberMetadataValue.json#L39
> > >>
> > >> On 2024/10/27 15:28:05 Anton Agestam wrote:
> > >> > -1, refer to comments on the RC 2 thread.
> > >> >
> > >> > Den sön 27 okt. 2024 kl 02:51 skrev Colin McCabe <
> cmcc...@apache.org
> > >:
> > >> >
> > >> > > This is the RC5 candidate for the release of Apache Kafka 3.9.0.
> > >> > >
> > >> > > - This is a major release, the final one in the 3.x line. (There
> > may of
> > >> > > course be other minor releases in this line, such as 3.9.1.)
> > >> > > - Tiered storage will be considered production-ready in this
> > release.
> > >> > > - This will be the final major release to feature the deprecated
> > >> ZooKeeper
> > >> > > mode.
> > >> > >
> > >> > > This release includes the following KIPs:
> > >> > > - KIP-853: Support dynamically changing KRaft controller
> membership
> > >> > > - KIP-1057: Add remote log metadata flag to the dump log tool
> > >> > > - KIP-1049: Add config log.summary.interval.ms to Kafka Streams
> > >> > > - KIP-1040: Improve handling of nullable values in InsertField,
> > >> > > ExtractField, and other transformations
> > >> > > - KIP-1031: Control offset translation in MirrorSourceConnector
> > >> > > - KIP-1033: Add Kafka Streams exception handler for exceptions
> > >> occurring
> > >> > > during processing
> > >> > > - KIP-1017: Health check endpoint for Kafka Connect
> > >> > > - KIP-1025: Optionally URL-encode clientID and clientSecret in
> > >> > > authorization header
> > >> > > - KIP-1005: Expose EarliestLocalOffset and TieredOffset
> > >> > > - KIP-950: Tiered Storage Disablement
> > >> > > - KIP-956: Tiered Storage Quotas
> > >> > >
> > >> > > Release notes for the 3.9.0 release:
> > >> > >
> > >>
> >
> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc5/RELEASE_NOTES.html
> > >> > >
> > >> > > *** Please download, test and vote by October 30, 2024.
> > >> > >
> > >> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > >> > > https://kafka.apache.org/KEYS
> > >> > >
> > >> > > * Release artifacts to be voted up

[jira] [Resolved] (KAFKA-17804) optimize ReplicaManager.completeDelayedOperationsWhenNotPartitionLeader

2024-10-29 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-17804.
-
Resolution: Won't Fix

> optimize ReplicaManager.completeDelayedOperationsWhenNotPartitionLeader
> ---
>
> Key: KAFKA-17804
> URL: https://issues.apache.org/jira/browse/KAFKA-17804
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jun Rao
>Assignee: kangning.li
>Priority: Minor
>
> Currently, ReplicaManager.completeDelayedOperationsWhenNotPartitionLeader is 
> called when (1) a replica is removed from the broker and (2) a replica 
> becomes a follower replica and it checks the completion of multiple 
> purgatories.  However, not all purgatories need to be checked in both 
> situations. For example, the fetch purgatory doesn't need to be checked in 
> case (2) since we support fetch from follower now. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1043: Administration of groups

2024-10-29 Thread Andrew Schofield
This concludes the vote for KIP-1043. The KIP is now ACCEPTED.

+1 (binding): David Jacot, Lianet Magrans, Lucas Brutschy, David Arthur
+1 (non-binding): Apoorv Mittal

I aim to get it implemented in Apache Kafka 4.0.

Thanks,
Andrew


From: David Jacot 
Sent: 29 October 2024 07:45
To: dev@kafka.apache.org 
Subject: Re: [VOTE] KIP-1043: Administration of groups

Thanks for the KIP and the good discussion. +1 (binding)

Best,
DJ

On Wed, Oct 2, 2024 at 3:16 AM David Arthur  wrote:

> +1 binding
>
> Cheers,
> David A
>
> On Mon, Sep 23, 2024 at 6:43 AM Apoorv Mittal 
> wrote:
>
> > Hi Andrew,
> > Thanks for the KIP, this will be very helpful.
> >
> > +1 (non-binding)
> >
> > Regards,
> > Apoorv Mittal
> >
> >
> > On Mon, Sep 23, 2024 at 11:38 AM Lianet M.  wrote:
> >
> > > Hello Andrew,
> > >
> > > Thanks for the KIP.
> > >
> > > +1 (binding)
> > >
> > > Lianet
> > >
> > > On Mon, Sep 23, 2024, 3:37 a.m. Lucas Brutschy
> > >  wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > thanks for the KIP!
> > > >
> > > > +1 (binding)
> > > >
> > > > Cheers,
> > > > Lucas
> > > >
> > > > On Mon, Sep 23, 2024 at 9:27 AM Andrew Schofield
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > I would like to start voting for KIP-1043: Administration of
> groups.
> > > > This KIP enhances the command-line tools to make it easier to
> > administer
> > > > groups on clusters with a variety of types of groups.
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups
> > > > >
> > > > > Thanks.
> > > > > Andrew
> > > >
> > >
> >
>
>
> --
> David Arthur
>