[jira] [Reopened] (KAFKA-16154) Make broker changes to return an offset for LATEST_TIERED_TIMESTAMP

2024-07-17 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen reopened KAFKA-16154:
---

> Make broker changes to return an offset for LATEST_TIERED_TIMESTAMP
> ---
>
> Key: KAFKA-16154
> URL: https://issues.apache.org/jira/browse/KAFKA-16154
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Christo Lolov
>Assignee: Christo Lolov
>Priority: Major
> Fix For: 3.8.0
>
>
> A broker should start returning offsets when given a timestamp of -5, which 
> signifies a LATEST_TIERED_TIMESTAMP.
> There are 3 cases.
> Tiered Storage is not enabled. In such a situation asking for 
> LATEST_TIERED_TIMESTAMP should always return no offset.
> Tiered Storage is enabled and there is nothing in remote storage. In such a 
> situation the offset returned should be 0.
> Tiered Storage is enabled and there is something in remote storage. In such a 
> situation the offset returned should be the highest offset the broker is 
> aware of.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.8.0 RC1

2024-07-17 Thread Josep Prat
Hi all,

I'm cancelling this VOTE thread, and I'll generate a new RC once the fix is
merged.

Best,

On Wed, Jul 17, 2024 at 8:55 AM Josep Prat  wrote:

> Thanks Greg for taking a look and providing a fix!
>
> Best,
>
> --
> Josep Prat
> Open Source Engineering Director, Aiven
> josep.p...@aiven.io   |   +491715557497 | aiven.io
> Aiven Deutschland GmbH
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>
> On Wed, Jul 17, 2024, 00:40 Greg Harris 
> wrote:
>
>> Hi Josep,
>>
>> I found this blocker regression:
>> https://issues.apache.org/jira/browse/KAFKA-17150
>> Summary: Connector configurations that specified converters with aliases
>> (e.g. JsonConverter instead of
>> org.apache.kafka.connect.json.JsonConverter)
>> previously worked, and in this RC they throw validation exceptions and are
>> unable to start/run.
>> Severity: This would take tasks offline completely until the workers are
>> downgraded or the connectors are reconfigured with full class names
>> Impact: Aliases are commonly used in demos, and slightly less commonly
>> used
>> in production environments. It is by no means a demo-only feature, and I
>> would expect this to break at least one production workload.
>> Risk: I've opened a PR to resolve it here:
>> https://github.com/apache/kafka/pull/16608 . We've already solved this
>> bug
>> once before, so it seems low-risk to widen the scope of the original fix
>> to
>> cover this new case.
>>
>> I found this apparent bug, but as it doesn't appear to be a regression and
>> has very low severity, I don't think it's a blocker on its own:
>> https://issues.apache.org/jira/browse/KAFKA-17148
>> If we're rolling a new RC, hopefully we can include this if it's resolved
>> quickly.
>>
>> I also performed the following successful verifications:
>> 1. Verified that protected members don't appear in the generated javadocs
>> (KAFKA-14839)
>> 2. Verified that Connect Distributed can start against a Kraft cluster
>> 3. Verified that plugin scanning doesn't throw errors with jackson
>> (KAFKA-17111)
>> 4. Verified that the allowed.paths configuration works for
>> DirectoryConfigProvider (KIP-993)
>>
>> Unfortunately due to the blocker regression, I think I'll have to -1
>> (binding) this RC. Sorry!
>>
>> Thanks,
>> Greg
>>
>> On Tue, Jul 16, 2024 at 1:32 PM Jakub Scholz  wrote:
>>
>> > +1 (non-binding). I used the staged Scala 2.13 binaries and the staged
>> > Maven artifacts. All seems to work fine. Thanks!
>> >
>> > Jakub
>> >
>> > On Mon, Jul 15, 2024 at 5:53 PM Josep Prat > >
>> > wrote:
>> >
>> > > Hello Kafka users, developers and client-developers,
>> > >
>> > > This is the second release candidate for Apache Kafka 3.8.0.
>> > >
>> > > Some of the major features included in this release are:
>> > > * KIP-1028: Docker Official Image for Apache Kafka
>> > > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
>> > > * KIP-1036: Extend RecordDeserializationException exception
>> > > * KIP-1019: Expose method to determine Metric Measurability
>> > > * KIP-1004: Enforce tasks.max property in Kafka Connect
>> > > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
>> > > * KIP-993: Allow restricting files accessed by File and Directory
>> > > ConfigProviders
>> > > * KIP-924: customizable task assignment for Streams
>> > > * KIP-813: Shareable State Stores
>> > > * KIP-719: Deprecate Log4J Appender
>> > > * KIP-390: Support Compression Level
>> > > * KIP-1018: Introduce max remote fetch timeout config for
>> > > DelayedRemoteFetch requests
>> > > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
>> > > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to replace
>> > > kafka.serializer.Decoder
>> > > * KIP-899: Allow producer and consumer clients to rebootstrap
>> > >
>> > > Release notes for the 3.8.0 release:
>> > > https://home.apache.org/~jlprat/kafka-3.8.0-rc1/RELEASE_NOTES.html
>> > >
>> > >  Please download, test and vote by Thursday, July 18th, 12pm PT*
>> > >
>> > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > https://kafka.apache.org/KEYS
>> > >
>> > > * Release artifacts to be voted upon (source and binary):
>> > > https://home.apache.org/~jlprat/kafka-3.8.0-rc1/
>> > >
>> > > * Docker release artifact to be voted upon(apache/kafka-native is
>> > supported
>> > > from 3.8+ release.):
>> > > apache/kafka:3.8.0-rc1
>> > > apache/kafka-native:3.8.0-rc1
>> > >
>> > > * Maven artifacts to be voted upon:
>> > >
>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> > >
>> > > * Javadoc:
>> > > https://home.apache.org/~jlprat/kafka-3.8.0-rc1/javadoc/
>> > >
>> > > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
>> > > https://github.com/apache/kafka/releases/tag/3.8.0-rc1
>> > >
>> > > Once https://github.com/apache/kafka-site/pull/608 is merged. You
>> will
>> > be
>> > > able to find the 

Re: [DISCUSS] KAFKA-17094: How should unregistered broker nodes be handled in KRaft quorum state?

2024-07-17 Thread Gantigmaa Selenge
Thanks everyone for your input. Any more thoughts on this from anyone else?

I agree that this should be fixed in 3.9, as the issue is marked as a
blocker. I think we might need a small KIP for it though, are we ok to
include a KIP for this in the 3.9 release? I'm aware that we are past the
KIP freeze and close to the feature freeze date.

Regards,
Gantigmaa

On Fri, Jul 12, 2024 at 7:51 PM Mickael Maison 
wrote:

> Hi Igor,
>
> I think getting the list of all registered brokers is an administrator
> requirement. Typically end users are not interested in offline
> brokers. As highlighted in the Jira, the main goal for retrieving this
> information is to get broker Ids to unregister. On the other hand, I
> see the DescribeCluster API more aimed towards regular users.
>
> As mentioned in the Jira, a stray registered broker will prevent
> upgrading the cluster metadata version. For that reason, I think this
> is definitively something we should fix in 3.9.
>
> Since the field is simply called "observers", as opposed to
> "currentVoters", and the response contains the lastFetchTimestamp, I
> wonder if the controller could return all nodes and have the client do
> the online/offline logic.
>
> Thanks,
> Mickael
>
> On Thu, Jul 11, 2024 at 5:13 PM Igor Soarez  wrote:
> >
> > Hi all,
> >
> > I had left a comment on the JIRA earlier, and thanks to Mickael I was
> made
> > aware of this thread, so I'll share it here too:
> >
> > I agree it makes sense to make this information available through
> RPCs,
> > but the Quorum is probably not the right place to represent
> registered (but inactive)
> > brokers – Quorum represents only the Raft-like part of the cluster.
> >
> > I think it would make more sense to add a new field to
> DescribeClusterResponse.
> > I guess handleDescribeCluster filters out inactive brokers to
> preserve the ZK-mode behavior.
> >
> > Thanks,
> >
> > --
> > Igor
>
>


RE: [VOTE] KIP-1054: Support external schemas in JSONConverter

2024-07-17 Thread Priyanka K U
Hi Chris,

Thank you for highlighting the scenario. I tested it using the JSON below, and 
you're right we will need to include escape sequences in the schema.content for 
the configuration. Although this makes the schema look a bit complex, we can 
always provide it in as a file if needed. I have updated the KIP with the 
details.


{
  "name": "api-file-source-json",
  "config": {
"value.converter.schemas.enable": true,
"value.converter.schema.content": "{\"type\": \"struct\", \"fields\": [{ 
\"field\": \"id\", \"type\": \"string\", \"optional\": false },{ \"field\": 
\"name\", \"type\": \"string\", \"optional\": false }]}"

  }
}

Thanks,
Priyanka.

From: Chris Egerton 
Date: Wednesday, 10 July 2024 at 9:27 PM
To: dev@kafka.apache.org 
Subject: [EXTERNAL] Re: [VOTE] KIP-1054: Support external schemas in 
JSONConverter
Hi Priyanka,

The issue is that connector configurations can also be submitted via the
REST API as JSON objects, and if a user tries to embed a schema directly in
that configuration as a string, it'll be fairly ugly. This does not apply
if using a properties file (in standalone mode) or the config provider
mechanism, which is why I think your testing may have missed this scenario.
If you still think that's not the case, can you share the exact connector
configurations you tested with, and instructions for building the JSON
converter with your proposed changes (can be as simple as "check out this
git branch from my repo").

Cheers,

Chris

On Wed, Jul 10, 2024, 11:36 Priyanka K U 
wrote:

> Hi Chris,
>
> We did the testing using the JAR that supports external schema
> functionality. This included specifying the external schema file in the
> Kafka properties file,
>  as well as embedding an inline schema directly within the connector
> properties file .
>
> Thanks,
> Priyanka
>
> From: Chris Egerton 
> Date: Friday, 5 July 2024 at 5:55 PM
> To: dev@kafka.apache.org 
> Subject: [EXTERNAL] Re: [VOTE] KIP-1054: Support external schemas in
> JSONConverter
> Hi Priyanka,
>
> How exactly did you test this feature? Was it in standalone mode (with a
> Java properties file), or with a JSON config submitted via the REST API?
>
> Cheers,
>
> Chris
>
>


[jira] [Resolved] (KAFKA-16853) Split RemoteLogManagerScheduledThreadPool

2024-07-17 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-16853.

Resolution: Fixed

> Split RemoteLogManagerScheduledThreadPool
> -
>
> Key: KAFKA-16853
> URL: https://issues.apache.org/jira/browse/KAFKA-16853
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Christo Lolov
>Assignee: Abhijeet Kumar
>Priority: Major
>
> *Summary*
> To begin with create just the RemoteDataExpirationThreadPool and move 
> expiration to it. Keep all settings as if the only thread pool was the 
> RemoteLogManagerScheduledThreadPool. Ensure that the new thread pool is wired 
> correctly to the RemoteLogManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17154) New consumer subscribe may join group without a call to consumer.poll

2024-07-17 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-17154:
--

 Summary: New consumer subscribe may join group without a call to 
consumer.poll
 Key: KAFKA-17154
 URL: https://issues.apache.org/jira/browse/KAFKA-17154
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.8.0
Reporter: Lianet Magrans
 Fix For: 4.0.0


With the new consumer, a call to consumer.subscribe will update the client 
subscription state via a SubscriptionChangedEvent. This will be picked up by 
the background thread, and will send a Heartbeat RPC on the next poll of the 
background thread (without requiring a consumer.poll). This is a change in 
behaviour with the legacy consumer, that breaks the consumer#subscribe contract 
for "rebalances will only occur during an active call to \{@link 
#poll(Duration)}", so it should be fixed.  

In the legacy its a similar principle but different behaviour: subscribe 
changes the subscription state, and the coordinator picks it up to send the 
JoinRequest on the next poll of the coordinator (but this happens only as part 
of a consumer.poll)

We should make the new consumer join (send HB to join) only on consumer.poll 
after a call to subscribed. We could consider having the 
SubscriptionChangedEvent only signal to the background that the subscription 
changed without doing the transition to JOINING that triggers the HB 
(membershipMgr#onSubscriptionUpdated could simply flip a flag, no transition), 
and then do the actual transition to JOINING when processing a PollEvent if the 
flag is on (flipping it back). (Just first simple approach that comes to mind, 
we should think about it a bit more and consider other interactions that could 
happen in between, like unsubscribe, close, etc).

We should add test coverage to ensure the common behaviour with the legacy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1043: Administration of groups

2024-07-17 Thread Andrew Schofield
Hi Lianet,
Thanks for your comments.

LM5. Unfortunately, the protocol type has to be a string rather than
an enumeration. This is because when people have created their own
extensions of the classic consumer group protocol, they have chosen
their own protocol strings. For example, the Confluent schema registry
uses “sr” and there are other examples in the wild.

LM6.1. It’s because of the difference between a parameterised
type and a raw type.

If I use:
public class ListGroupsResult
public class ListConsumerGroupsResult

then ListGroupsResult (no type variable) is a raw type which does
not provide a type for the type variable. This causes compiler warnings
when the type is used, unless it’s used as ListGroupsResult.

However, this works better.
public class AbstractListGroupsResult
public class ListGroupsResult extends AbstractListGroupsResult
public class ListConsumerGroupsResult extends 
AbstractListGroupsResult

I’ll change the KIP to use this.

LM6.2. I was just pointing out a difference and you’re happy
with it. That’s good.

LM7. If you have a cluster with a mixture of classic and modern
consumer groups, to list them all you could use this:

bin/kafka-groups.sh --protocol consumer

When there are no classic consumer groups, you could do:

bin/kafka-groups.sh --group-type consumer

However, this only gives a complete list if you don’t have any classic
consumer groups.

As a result, I suggested --consumer so you don’t need to know
or care about the existence of classic and modern consumer groups.
I think it helps, but you aren’t convinced I think, which tells me
more thinking needed here.

Maybe adding --share would help, so only power users would
use --group-type or --protocol to deal with the more complicated
cases.

LM8. It’s just not clear. I was trying to make the output the same
whether the group was created, or whether it already existed. In
either case, there’s a share group in existence. The
AlterShareGroupOffsets RPC doesn’t distinguish between the
two cases in its response.

Thanks,
Andrew

> On 16 Jul 2024, at 21:24, Lianet M.  wrote:
>
> Hello Andrew, thanks for considering the feedback. Some follow-ups and
> other comments:
>
> LM4. Good point about the older RPC versions and therefore the
> Optional, agreed.
>
> LM5. In GroupListing, should we use the existing
> org.apache.kafka.clients.ProtocolType to represent the protocol (instead of
> String). I don’t quite like the fact that the enum is inside the
> GroupRebalanceConfig though, feels like it should be a first level citizen.
>
>
> LM6. Regarding the changes around ListGroupResults with generics.
> - LM6.1. What’s the need for keeping both, a base
> AbstractListGroupsResult and the ListGroupsResult
> extends AbstractListGroupsResult? Would it work if instead we
> simply have a single ListGroupsResult from which
> specific groups would inherit? I'm thinking of this:
>
> public class *ListGroupsResult* -> this would
> probably end up containing the implementation that currently exists in
> ListConsumerGroupsResult for #all, #errors and #valid, that all group types
> would be able to reuse if we use a generic T extends GroupListing
>
>
> public class *ListConsumerGroupsResult extends
> ListGroupsResult* -> slim impl, agreed
>
> - LM6.2. Related to the concern of the javadoc for
> ListConsumerGroupsResult. This class will inherit 3 funcs (all, valid,
> error), that have a common behaviour (and javadoc) regardless of the
> generic type, so I expect it won’t be confusing in practice? We will end up
> with the java doc for, let’s say, ListConsumerGroupsResult#all showing the
> parent javadoc that aligns perfectly with what the #all does. If ever we
> need a different behaviour/javadoc for any of the functions in the child
> classes, we would have the alternative of overriding the func and javadoc.
> Makes sense? Not sure if I’m missing other readability issues with the
> javadocs you’re seeing.
>
> LM7. Looks better to me now with the added filter on the kafka-group.sh for
> the protocol. But then, the new –consumer filter achieves the same as
> –protocol CONSUMER right? If so, I wonder if it would just be simpler to
> support the --protocol as a way to achieve this? (sharing your struggle on
> how to get this right, but feels easier to discover and reason about the
> more we have filters based on the output, and not made up of
> combinationslet's keep iterating and we'll get there :) )
>
> LM8. Is the output wrong (or just not clear) in this example? (It seemed to
> me this was referring to the successful case where we create a new share
> group, so I was expecting a "successfully created" kind of output)
>
>  $ bin/kafka-share-groups.sh --bootstrap-server localhost:9092
> --create --group NewShareGroup
>  Share group 'NewShareGroup' exists.
>
> Thanks!
>
> Lianet
>
> On Mon, Jul 15, 2024 at 6:00 AM Andrew Schofield 
> wrote:
>
>> Hi Lianet,
>> Thanks for your comments.
>>
>> LM1. Admin.listGroups() in principl

Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-17 Thread Mickael Maison
Hi David,

DA1: It's a point I considered, especially being able to cordon a
whole broker. With the current proposal you just need to set
cordoned.log.dirs to the same value as log.dirs. That does not seem
excessively difficult.

DA2: I did consider using a new metadata record like we do for
fence/unfence. Since I don't expect cordoned log dirs to change very
frequently, and the size should be small, I opted to reuse the
BrokerRegistrationRecord metadata record. At the moment I guess it was
mostly for the convenience while prototyping. Semantically it probably
makes sense to have separate records. Your further point suggest
design changes in the mechanism as a whole, so let's discuss these
first and we can return to the exact metadata records after.
I find the idea of having dedicated RPCs interesting. One of the
reasons I piggybacked on the heartbeating process is for the mapping
between log directory names and their UUIDs. Currently the mapping
only exists on each broker instance. So if we wanted a dedicated RPC,
we would first need to change how we manage the log directory to UUID
mappings. I guess this could be done via the BrokerRegistration API.
I'm not sure about storing additional metadata (reason, timestamp). We
currently don't do that for any operations
(AlterPartitionReassignments, UnregisterBroker). Typically these are
stored in the tools used by the operators to drive these operations.
You bring another interesting point about the ability to cordon
brokers/log directories while they are offline. It's not something the
current proposal supports. I'm not sure how useful this would turn out
it practice. In experience is that brokers are mostly online, so I'd
expect the need to do so relatively rare. This also kind of loops back
to KAFKA-17094 [0] and the discussion Gantigmaa started on the dev
list [1] about being able to identify the ids of offline (but still
registered) brokers.

DA3: With the current proposal, I don't see a reason why you would
want to disable the new behavior. If you don't want to use it, you
have nothing to do. It's opt-in as you need to set cordoned.log.dirs
on some brokers to get the new behavior. If you don't want it anymore,
you should unset cordoned.log.dirs. Can you explain why this would not
work?

DA4: Yes

0: https://issues.apache.org/jira/browse/KAFKA-17094
1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo

Thanks,
Mickael


On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash
 wrote:
>
> Hi Mickael,
>
> In the BrokerHearbeatRequest.json, the flexibleVersions are bumped from
> "0+" to "1+". Is it a typo?
>
>
> On Fri, Jul 12, 2024 at 11:42 PM David Arthur  wrote:
>
> > Mickael, thanks for the KIP! I think this could be quite a useful feature.
> >
> > DA1: Having to know each of the log dirs for a broker seems a bit
> > inconvenient for cases where we want to cordon off a whole broker. I do
> > think having the ability to cordon off a specific log dir is useful for
> > JBOD, but I imagine a common case might be to cordon off the whole broker.
> >
> > DA2: Looks like the new "cordoned.log.dirs" can be configured statically
> > and updated dynamically per-broker. What do you think about a new metadata
> > record and RPC instead of using a config? From my understanding, the
> > BrokerRegistration/Heartbeat is more about the lifecycle of a broker
> > whereas cordoning a broker is an operator driven action. It might make
> > sense to have a separate record for this. We could include additional
> > fields like a timestamp, a reason/comment field (e.g., "decommissioning",
> > "disk failure", "new broker" etc), stuff like that.
> >
> > This would also allow cordoning to be done while a broker is offline or
> > before it has been provisioned. Not sure how likely that is, but might be
> > useful?
> >
> > DA3: Can we consider having a configuration to enable/disable the new
> > replica placer behavior? This would be separate from the new
> > MetadataVersion for the RPC/record changes.
> >
> > DA4: In the Motivation section, you mention the cluster expansion scenario.
> > For this scenario, is the expectation that the operator will cordon off the
> > existing full brokers so placements only happen on the new brokers?
> >
> > Cheers,
> > David
> >
> > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison 
> > wrote:
> >
> > > Hi Kamal,
> > >
> > > Thanks for taking a look at the KIP!
> > >
> > > I briefly considered that option initially but I found it not very
> > > practical once you have more than a few cordoned log directories.
> > > I find your example is already not very easy to read, and it only has
> > > 2 entries. Also if the configuration is at the cluster level it'sis
> > > not easy to see if a broker has all its log directories cordoned, and
> > > you still need to describe a specific broker's configuration to find
> > > the "name" of a log directory you want to cordon.
> > >
> > > I think an easy way to get an overall view of the cordoned log
> > > direc

Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-17 Thread Mickael Maison
Hi Kamal,

Good spot, yes this is a typo. The flexibleVersions stays as "0+". Fixed

Thanks,
Mickael

On Wed, Jul 17, 2024 at 6:14 PM Mickael Maison  wrote:
>
> Hi David,
>
> DA1: It's a point I considered, especially being able to cordon a
> whole broker. With the current proposal you just need to set
> cordoned.log.dirs to the same value as log.dirs. That does not seem
> excessively difficult.
>
> DA2: I did consider using a new metadata record like we do for
> fence/unfence. Since I don't expect cordoned log dirs to change very
> frequently, and the size should be small, I opted to reuse the
> BrokerRegistrationRecord metadata record. At the moment I guess it was
> mostly for the convenience while prototyping. Semantically it probably
> makes sense to have separate records. Your further point suggest
> design changes in the mechanism as a whole, so let's discuss these
> first and we can return to the exact metadata records after.
> I find the idea of having dedicated RPCs interesting. One of the
> reasons I piggybacked on the heartbeating process is for the mapping
> between log directory names and their UUIDs. Currently the mapping
> only exists on each broker instance. So if we wanted a dedicated RPC,
> we would first need to change how we manage the log directory to UUID
> mappings. I guess this could be done via the BrokerRegistration API.
> I'm not sure about storing additional metadata (reason, timestamp). We
> currently don't do that for any operations
> (AlterPartitionReassignments, UnregisterBroker). Typically these are
> stored in the tools used by the operators to drive these operations.
> You bring another interesting point about the ability to cordon
> brokers/log directories while they are offline. It's not something the
> current proposal supports. I'm not sure how useful this would turn out
> it practice. In experience is that brokers are mostly online, so I'd
> expect the need to do so relatively rare. This also kind of loops back
> to KAFKA-17094 [0] and the discussion Gantigmaa started on the dev
> list [1] about being able to identify the ids of offline (but still
> registered) brokers.
>
> DA3: With the current proposal, I don't see a reason why you would
> want to disable the new behavior. If you don't want to use it, you
> have nothing to do. It's opt-in as you need to set cordoned.log.dirs
> on some brokers to get the new behavior. If you don't want it anymore,
> you should unset cordoned.log.dirs. Can you explain why this would not
> work?
>
> DA4: Yes
>
> 0: https://issues.apache.org/jira/browse/KAFKA-17094
> 1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo
>
> Thanks,
> Mickael
>
>
> On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash
>  wrote:
> >
> > Hi Mickael,
> >
> > In the BrokerHearbeatRequest.json, the flexibleVersions are bumped from
> > "0+" to "1+". Is it a typo?
> >
> >
> > On Fri, Jul 12, 2024 at 11:42 PM David Arthur  wrote:
> >
> > > Mickael, thanks for the KIP! I think this could be quite a useful feature.
> > >
> > > DA1: Having to know each of the log dirs for a broker seems a bit
> > > inconvenient for cases where we want to cordon off a whole broker. I do
> > > think having the ability to cordon off a specific log dir is useful for
> > > JBOD, but I imagine a common case might be to cordon off the whole broker.
> > >
> > > DA2: Looks like the new "cordoned.log.dirs" can be configured statically
> > > and updated dynamically per-broker. What do you think about a new metadata
> > > record and RPC instead of using a config? From my understanding, the
> > > BrokerRegistration/Heartbeat is more about the lifecycle of a broker
> > > whereas cordoning a broker is an operator driven action. It might make
> > > sense to have a separate record for this. We could include additional
> > > fields like a timestamp, a reason/comment field (e.g., "decommissioning",
> > > "disk failure", "new broker" etc), stuff like that.
> > >
> > > This would also allow cordoning to be done while a broker is offline or
> > > before it has been provisioned. Not sure how likely that is, but might be
> > > useful?
> > >
> > > DA3: Can we consider having a configuration to enable/disable the new
> > > replica placer behavior? This would be separate from the new
> > > MetadataVersion for the RPC/record changes.
> > >
> > > DA4: In the Motivation section, you mention the cluster expansion 
> > > scenario.
> > > For this scenario, is the expectation that the operator will cordon off 
> > > the
> > > existing full brokers so placements only happen on the new brokers?
> > >
> > > Cheers,
> > > David
> > >
> > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison 
> > > wrote:
> > >
> > > > Hi Kamal,
> > > >
> > > > Thanks for taking a look at the KIP!
> > > >
> > > > I briefly considered that option initially but I found it not very
> > > > practical once you have more than a few cordoned log directories.
> > > > I find your example is already not very easy to read, and it only ha

[jira] [Created] (KAFKA-17155) Redundant rebalances triggered after connector creation/deletion and task config updates

2024-07-17 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-17155:
-

 Summary: Redundant rebalances triggered after connector 
creation/deletion and task config updates
 Key: KAFKA-17155
 URL: https://issues.apache.org/jira/browse/KAFKA-17155
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 3.8.0, 3.9.0
Reporter: Chris Egerton


With KAFKA-17105, a scenario is described where a connector may be 
unnecessarily restarted soon after it has been created.

Similarly, when any events occur that set the 
[DistributedHerder.needsReconfigRebalance 
flag|https://github.com/apache/kafka/blob/a66a59f427b30611175fd029d86832d00aa5aabd/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L215]
 to true (at the time of writing these are the detection of a new connector, 
the removal of an existing connector, or the detection of new task 
configurations regardless of whether existing configurations existed for the 
connector), it is possible that a rebalance has already started because another 
worker has detected this change as well. In that case, 
{{needsReconfigRebalance}} will still be set to {{true}} even after that 
rebalance has taken place, and the worker will force an unnecessary second 
rebalance.

We might consider changing the "needs reconfig rebalance" field into a 
"reconfig rebalance threshold" field, which contains the latest offset of a 
record consumed from the config topic that warrants a rebalance. When possibly 
performing rebalances based on this field, the worker can check if the offset 
in the assignment given out by the leader during the most recent rebalance is 
greater than or equal to this threshold, and if so, choose not to force a 
rebalance.

 

This has been caused issues in some tests, but may be a benign race condition 
that does not have practical consequences in the real world. We may not want to 
address this (especially with an approach that increases the complexity of the 
code base and comes with risk of regression) until/unless someone states that 
it's affected them outside of Kafka Connect unit tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17156) Metadata controller to provide the KafkaRaftClient with the broker's supported kraft.version

2024-07-17 Thread Jira
José Armando García Sancio created KAFKA-17156:
--

 Summary: Metadata controller to provide the KafkaRaftClient with 
the broker's supported kraft.version
 Key: KAFKA-17156
 URL: https://issues.apache.org/jira/browse/KAFKA-17156
 Project: Kafka
  Issue Type: Sub-task
Reporter: José Armando García Sancio






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #3116

2024-07-17 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-17157) Refactor mutability and state transitions in DistributedHerder

2024-07-17 Thread Greg Harris (Jira)
Greg Harris created KAFKA-17157:
---

 Summary: Refactor mutability and state transitions in 
DistributedHerder
 Key: KAFKA-17157
 URL: https://issues.apache.org/jira/browse/KAFKA-17157
 Project: Kafka
  Issue Type: Wish
Reporter: Greg Harris


The DistributedHerder is a very large and interconnected class, and is 
responsible for:
 * Servicing REST requests
 * Making REST requests
 * Maintaining group membership
 * Effecting the rebalance operations (starting, stopping connectors and tasks)
 * Effecting session key rotation
 * Emitting metrics

This has lead to the DistributedHerder collecting 20+ mutable fields with 
somewhat non-trivial relationships between one another. The DistributedHerder 
(at time of writing this description) has 3k+ lines of code, 100+ import 
statements for 60+ project classes. 
It is also one of the most tested classes in Connect, which should enable us to 
proceed with a refactor with a low probability of regression. We should 
investigate and determine if there is an opportunity to compartmentalize the 
state management of the DistributedHerder to make it easier to reason about 
this class, and test sub-units of the logic in isolation from the 
DistributedHerder itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17105) Unnecessary connector restarts after being newly created

2024-07-17 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-17105.
---
Resolution: Fixed

> Unnecessary connector restarts after being newly created
> 
>
> Key: KAFKA-17105
> URL: https://issues.apache.org/jira/browse/KAFKA-17105
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.1.0, 3.0.0, 3.0.1, 3.2.0, 3.1.1, 3.3.0, 3.0.2, 3.1.2, 
> 3.2.1, 3.4.0, 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.5.0, 3.4.1, 3.6.0, 3.5.1, 3.5.2, 
> 3.7.0, 3.6.1, 3.6.2, 3.8.0, 3.7.1, 3.9.0
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Minor
> Fix For: 3.9.0
>
>
> When a connector is created, it may be restarted unnecessarily immediately 
> after it is first started by the worker to which it has been assigned:
>  # Connector config is written to the config topic
>  # A worker reads the new record from the config topic, and adds the 
> connector to its connectorConfigUpdates field (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L2445])
>  # Another worker has already seen the new connector config and triggered a 
> rebalance; this worker participates in the ensuing rebalance (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L419])
>  and is assigned the connector
>  # After the rebalance is over (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L422]),
>  the worker starts all of the connectors it has been newly-assigned (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1843]
>  and 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1963-L2001])
>  # Once finished with that, the worker checks for new connector configs (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L535])
>  and restarts all connectors in the connectorConfigUpdates field (see 
> [here|https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L717-L726]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-17 Thread Greg Harris
Hi all,

We found a blocker in 3.8.0-rc1
https://issues.apache.org/jira/browse/KAFKA-17150 that is now merged to
trunk and backported to 3.8.
Additionally I've merged and backported the lower-severity
https://issues.apache.org/jira/browse/KAFKA-17148 to trunk, 3.8, and 3.7
because Dmitry Werner provided a very prompt fix.
Thanks to Chia-Ping, Chris, and Josep for prompt reviews. At this time I am
not aware of any more blockers for 3.8.0 and we can proceed with rc2.

Thanks!
Greg

On Wed, Jul 10, 2024 at 10:49 PM Josep Prat 
wrote:

> Hi Greg,
>
> I'll make sure the PR with the fix (
> https://github.com/apache/kafka/pull/16565) gets merged today. Thanks for
> bringing this up!
>
> Best,
>
> --
> Josep Prat
> Open Source Engineering Director, Aiven
> josep.p...@aiven.io   |   +491715557497 | aiven.io
> Aiven Deutschland GmbH
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>
> On Wed, Jul 10, 2024, 22:55 Greg Harris 
> wrote:
>
> > Hi Josep,
> >
> > A contributor just raised a regression [1] that I think should be
> addressed
> > in 3.8.0 prior to the release.
> >
> > Summary: This change [2] causes multiple ERROR logs to appear during
> worker
> > startup when operators install other, unrelated plugins that package the
> > Jackson library.
> > Severity: Workers will continue to operate normally and the errors will
> be
> > cosmetic. Operators and automatic systems that watch logs may roll-back
> > upgrades due to the perception of a severe problem.
> > Impact: I found 12 out of 250 third-party plugins that package the
> Jackson
> > library and trigger this error on upgrade. This will almost certainly
> > affect several users upon upgrades, and does not require obscure setups.
> > Risk: The contributor has opened a simple fix PR [3], and I have verified
> > that it addresses the problem, and can be merged tomorrow. As an
> > alternative, we can revert the performance change completely [4] but I
> want
> > to avoid this.
> >
> > With the above, what would you like to do for this release? Merge the
> fix,
> > revert, or leave as-is?
> >
> > Thanks,
> > Greg
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-17111
> > [2] https://issues.apache.org/jira/browse/KAFKA-15996
> > [3] https://github.com/apache/kafka/pull/16565
> > [4] https://github.com/apache/kafka/pull/16568
> >
> > On Wed, Jul 10, 2024 at 7:37 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Dongjin,
> > >
> > > It's great to see you back!
> > > I hope what we did with KIP-390 matches your expectations. Looking
> > > forward to seeing the reboot of KIP-780.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Wed, Jul 10, 2024 at 4:21 PM Dongjin Lee 
> wrote:
> > > >
> > > > Hi Josep,
> > > >
> > > > OMG, what happened while I could not be involved with the Kafka
> > > community?
> > > > Thanks for digging down the whole situation.
> > > >
> > > > @Mickael I greatly appreciate your effort in finalizing the feature.
> It
> > > > seems like I only have to re-boot the KIP-780.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Tue, Jul 9, 2024 at 12:52 AM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Dongjin,
> > > > >
> > > > > KIP-390 is part of the 3.8 release because the JIRA associated with
> > it:
> > > > > https://issues.apache.org/jira/browse/KAFKA-7632 is closed as
> > > resolved,
> > > > > hence the KIP is declared done and ready. I did some digging, and I
> > saw
> > > > > that Mickael was the one doing the PR that closed the JIRA ticket:
> > > > > https://github.com/apache/kafka/pull/15516
> > > > > This means that the KIP work is merged and unfortunately it is now
> > > quite
> > > > > late to perform a rollback for this feature.
> > > > >
> > > > > @Mickael Maison  let me know if
> anything I
> > > > > mentioned is not accurate (as you were the one bringing the KIP to
> > > > > completion).
> > > > >
> > > > > Best,
> > > > >
> > > > > On Mon, Jul 8, 2024 at 5:38 PM Dongjin Lee 
> > wrote:
> > > > >
> > > > > > Hi Josep,
> > > > > >
> > > > > > Thanks for managing the 3.8 release. I have a request: could you
> > > please
> > > > > > move the KIP-390 into the 3.9 release?
> > > > > >
> > > > > > Here is the background: KIP-390 was adopted first but hasn't been
> > > > > released
> > > > > > for a long time. After some time, I proposed KIP-780 with further
> > > > > > improvements and also corrected an obvious design error
> > > > > > (`compression.level` → `compression.(gzip|lz4|zstd). level`), but
> > it
> > > > > hasn't
> > > > > > been adopted due to the community's lack of response, my changing
> > > job,
> > > > > > focusing the in-house fork, etc. And last weekend, I found that
> > > KIP-380
> > > > > has
> > > > > > been included in the 3.8 release plan.
> > > > > >
> > > > > > - KIP-390:
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> 

[jira] [Resolved] (KAFKA-17150) Connect converter validation does not accept class aliases

2024-07-17 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-17150.
-
  Assignee: Greg Harris
Resolution: Fixed

> Connect converter validation does not accept class aliases
> --
>
> Key: KAFKA-17150
> URL: https://issues.apache.org/jira/browse/KAFKA-17150
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Blocker
> Fix For: 3.8.0
>
>
> Connect allows users to specify aliases instead of full classes when 
> configuring plugins. For example, the following configuration is valid:
> {noformat}
> {
>         "connector.class": "FileStreamSource",
>         "topic": "anything",
>         "key.converter": "JsonConverter"
> }{noformat}
> In <= 3.7.1 this will pass validation and function correctly.
> In 3.8.0-rc1 this will instead fail with this error message:
> {noformat}
> [2024-07-16 13:58:54,250] ERROR Failed to instantiate key converter class 
> JsonConverter; this should have been caught by prior validation logic 
> (org.apache.kafka.connect.runtime.AbstractHerder:462)
> java.lang.ClassNotFoundException: JsonConverter
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:534)
>   at java.base/java.lang.Class.forName(Class.java:513)
>   at org.apache.kafka.common.utils.Utils.loadClass(Utils.java:426)
>   at org.apache.kafka.common.utils.Utils.newInstance(Utils.java:415)
>   at 
> org.apache.kafka.connect.runtime.AbstractHerder.validateConverterConfig(AbstractHerder.java:460)
>   at 
> org.apache.kafka.connect.runtime.AbstractHerder.validateKeyConverterConfig(AbstractHerder.java:528)
>   at 
> org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:734)
>   at 
> org.apache.kafka.connect.runtime.AbstractHerder.lambda$validateConnectorConfig$3(AbstractHerder.java:574)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>   at java.base/java.lang.Thread.run(Thread.java:1583){noformat}
> and this error will eventually be shown to the user:
> {noformat}
> Connector configuration is invalid and contains the following 1 error(s):
> Failed to load class JsonConverter: JsonConverter{noformat}
> When the full class name is specified, the class is found and passes 
> validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Update powered-by_adding SPITHA.html [kafka-site]

2024-07-17 Thread via GitHub


mjsax commented on PR #611:
URL: https://github.com/apache/kafka-site/pull/611#issuecomment-2235228500

   Thanks for getting back @VictorParkM -- I did look at the webpage but cannot 
see any changes? The `(R)` signs are still there, but I cannot find a 
"copyright disclaimer" -- Did I miss it? Can you point me to it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update powered-by_adding SPITHA.html [kafka-site]

2024-07-17 Thread via GitHub


VictorParkM commented on PR #611:
URL: https://github.com/apache/kafka-site/pull/611#issuecomment-2235325695

   You can find the copyright disclaimer at the bottom of the page.
   [image: image.png]
   Thank you
   
   
   
   
   JeongHun Park
   
   Business Development / Team Manager
   
   SPITHA INC.
   43, Daesagwan-ro, Yongsan-gu, 04401
   SEOUL, SOUTH KOREA
   M) +82.010.4321.2320
   E) ***@***.*** ***@***.***>
   
   *spitha.io *
   
   
   
   The content of this email is confidential and intended for the recipient
   specified in message only. It is strictly forbidden to share any part of
   this message with any third party, without a written consent of the sender.
   If you received this message by mistake, please reply to this message and
   follow with its deletion, so that we can ensure such a mistake does not
   occur in the future.
   
   
   2024년 7월 18일 (목) 오후 12:14, Matthias J. Sax ***@***.***>님이 작성:
   
   > Thanks for getting back @VictorParkM  --
   > I did look at the webpage but cannot see any changes? The (R) signs are
   > still there, but I cannot find a "copyright disclaimer" -- Did I miss it?
   > Can you point me to it?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > ,
   > or unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.8 #70

2024-07-17 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #186

2024-07-17 Thread Apache Jenkins Server
See 




Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #3118

2024-07-17 Thread Apache Jenkins Server
See