Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-22 Thread Tom Bentley
Hi José,

Thanks for the KIP. As Justine mentioned, this KIP currently lacks a
motivation, and nor does the JIRA provide any context. Please could you
provide this context, otherwise it's impossible for people on this list to
understand the problem you're trying to solve here.

Many thanks,

Tom

On Fri, 22 Jul 2022 at 01:04, Colin McCabe  wrote:

> Hi José,
>
> Thanks for the KIP! I have not had time to fully digest it, but I had some
> initial questions:
>
> 1. It seems like the proposal is to have a UUID per partition directory on
> the voter. If I understand correctly, this is sometimes referred to as
> "VoterUUID" and sometimes as "ReplicaUUID." The latter seems more accurate,
> since a single voter could have multiple of these IDs, in a situation where
> we had multiple Raft topics. So it would be good to standardize on that.
> Also, I didn't see a description of how this would be stored in the log
> directory. That would be good to add.
>
> 2. When we originally did the Raft and Quorum Controller KIPs, one
> contentious topic was node IDs. We eventually settled on the idea that
> broker and controller IDs were in the same ID space. So you can't (for
> example) have a broker 3 that is in a separate JVM from controller 3. This
> is pretty easy to enforce with a static configuration, but it seems like it
> will be harder to do dynamically.
>
> I would like to keep this invariant. This probably requires us to reject
> attempts to add a new quorum voter which duplicates a broker ID (except in
> the special case of co-location!) Similarly, we should reject broker
> registrations that duplicate an unrelated controller ID. The broker's
> incarnation ID is the key to doing this, I think. But that requires us to
> send the incarnation ID in many of these RPCs.
>
> 3. Is it really necessary to put the endpoint information into the
> AddVoterRecord? It seems like that could be figured out at runtime, like we
> do today. If we do need it, it seems particularly weird for it to be
> per-partition (will we have a separate TCP port for each Raft partition?) I
> also don't know why we'd want multiple endpoints. We have that for the
> broker because the endpoints have different uses, but that isn't the case
> here.
>
> The original rationale for multiple endpoints on the controllers was to
> support migration from PLAINTEXT to SSL (or whatever). But that only
> requires multiple listeners to be active on the receive side, not send
> side. A single voter never needs more than one endpoint to contact a peer.
>
> Overall, I think we'd be better off keeping this as soft state rather than
> adding it to the log. Particularly if it's not in the log at all for the
> static configuration case...
>
> 4. How do you get from the static configuration situation to the dynamic
> one? Can it be done with a rolling restart? I think the answer is yes, but
> I wasn't quite sure on reading. Does a leader using the static
> configuration auto-remove voters that aren't in that static config, as well
> as auto-add? The adding behavior is spelled out, but not removing (or maybe
> I missed it).
>
> best,
> Colin
>
>
> On Thu, Jul 21, 2022, at 09:49, José Armando García Sancio wrote:
> > Hi all,
> >
> > I would like to start the discussion on my design to support
> > dynamically changing the set of voters in the KRaft cluster metadata
> > topic partition.
> >
> > KIP URL: https://cwiki.apache.org/confluence/x/nyH1D
> >
> > Thanks!
> > -José
>
>


[DISCUSS] KIP-855: Add schema.namespace parameter to SetSchemaMetadata SMT in Kafka Connect

2022-07-22 Thread Michael Negodaev
Hi all,

I would like to start the discussion on my design to add "schema.namespace"
parameter in SetSchemaMetadata Single Message Transform in Kafka Connect.

KIP URL: https://cwiki.apache.org/confluence/x/CiT1D

Thanks!
-Michael


Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-22 Thread José Armando García Sancio
Tom Bentley wrote:
> Thanks for the KIP. As Justine mentioned, this KIP currently lacks a
> motivation, and nor does the JIRA provide any context. Please could you
> provide this context, otherwise it's impossible for people on this list to
> understand the problem you're trying to solve here.

Justine Olshan wrote:
> I was curious a bit more about the motivation here. That section seems to be 
> missing.

I updated the motivation section with the following text:

KIP-595 introduced KRaft topic partitions. These are partitions with
replicas that can achieve consensus on the Kafka log without relying
on the Controller or ZK. The KRaft Controllers in KIP-631 use one of
these topic partitions (called cluster metadata topic partition) to
order operations on the cluster, commit them to disk and replicate
them to other controllers and brokers.

Consensus on the cluster metadata partition was achieved by the voters
(Controllers). If the operator of a KRaft cluster wanted to make
changes to the set of voters, they would have to  shutdown all of the
controllers nodes and manually make changes to the on-disk state of
the old controllers and new controllers. If the operator wanted to
replace an existing voter because of a disk failure or general
hardware failure, they would have to make sure that the new voter node
has a superset of the previous voter's on-disk state. Both of these
solutions are manual and error prone.

This KIP describes a protocol for extending KIP-595 and KIP-630 so
that the operator can programmatically update the voter set in a way
that is safe and is available. There are two important use cases that
this KIP supports. One use case is that the operator wants to change
the number of controllers by adding or removing a controller.  The
other use case is that the operation wants to replace a controller
because of a disk or hardware failure.

Thanks!
-- 
-José


Re: [DISCUSS] Apache Kafka 3.2.1 release

2022-07-22 Thread Viktor Somogyi-Vass
Thanks David :)

On Thu, Jul 21, 2022 at 6:00 PM David Arthur  wrote:

> Viktor, seeing as it's been on trunk for a while and is a very small
> change, it seems fine to include in this release. I just finished building
> the RC, but haven't started a vote thread yet. Good timing on your part :)
> I'll merge this PR shortly and start a new RC build.
>
> -David
>
> On Thu, Jul 21, 2022 at 11:42 AM Viktor Somogyi-Vass
>  wrote:
>
> > Hi David,
> >
> > Found an issue (tight loop in the consumer), fixed it on trunk and
> > backported it onto the 3.2 branch. Is it possible to include this in the
> > 3.2.1 release?
> > https://github.com/apache/kafka/pull/12417
> >
> > Thanks,
> > Viktor
> >
> > On Tue, Jul 19, 2022 at 5:57 PM Randall Hauch  wrote:
> >
> > > Hi, Chris S and Chris E,
> > >
> > > Thanks for quickly working on and reviewing the
> > > https://issues.apache.org/jira/browse/KAFKA-14079 issue mentioned
> > > above. The two PRs you created, one for the `trunk` branch and one for
> > the
> > > `3.2` branch, have both been merged, and the issue has been marked as
> > > resolved.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> > > On Sun, Jul 17, 2022 at 5:44 PM Christopher Shannon <
> > > christopher.l.shan...@gmail.com> wrote:
> > >
> > > > Hi Chris E.
> > > >
> > > > Thanks for all the feedback earlier, I updated the PR based on your
> > > > comments and also pushed a second PR for trunk for 3.3.0
> > > >
> > > > I agree the impact is high which is why I found the issue. I recently
> > > > turned on this feature and suddenly my connect worker/task kept
> > > > periodically falling over with OOM errors. Finally I took a heap dump
> > and
> > > > saw a ton of submitted record objects in memory and started
> > investigating
> > > > and that's how I figured out it was related to this new feature..
> > > >
> > > > Chris
> > > >
> > > > On Sun, Jul 17, 2022 at 1:48 PM Chris Egerton <
> fearthecel...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > Good find, and thanks for filing a fix. I agree that we should get
> > this
> > > > > into 3.2.1 if possible. The risk is fairly low (the functional
> parts
> > of
> > > > the
> > > > > fix are just two lines long) and the impact of the bug is high for
> > > users
> > > > > who have configured source connectors with "errors.tolerance" set
> to
> > > > "all".
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Sat, Jul 16, 2022 at 12:26 PM Christopher Shannon <
> > > > > christopher.l.shan...@gmail.com> wrote:
> > > > >
> > > > > > HI,
> > > > > >
> > > > > > I think I found a memory leak that was introduced in 3.2.0 in a
> > > > Connector
> > > > > > SourceTask. I created a JIRA:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-14079 and small PR
> > with
> > > a
> > > > > fix:
> > > > > > https://github.com/apache/kafka/pull/12412
> > > > > >
> > > > > > I think this should be included in 3.2.1. It should also go into
> > > 3.3.0
> > > > > but
> > > > > > there was a lot of refactoring done there with the source task
> code
> > > due
> > > > > to
> > > > > > KIP-618 so another PR needs to be done for that if this is
> merged.
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Fri, Jul 15, 2022 at 10:06 AM David Arthur 
> > > > wrote:
> > > > > >
> > > > > > > Here is the release plan for 3.2.1
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.1
> > > > > > >
> > > > > > > I am working on getting clarity on the one open blocker. Once
> > that
> > > is
> > > > > > > resolved (or rescheduled for a future release), I will build
> the
> > > > first
> > > > > > > release candidate.
> > > > > > >
> > > > > > > -David
> > > > > > >
> > > > > > > On Thu, Jul 14, 2022 at 3:10 AM Luke Chen 
> > > wrote:
> > > > > > >
> > > > > > > > +1, Thanks David!
> > > > > > > >
> > > > > > > > On Thu, Jul 14, 2022 at 1:16 PM David Jacot <
> > > david.ja...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1. Thanks David.
> > > > > > > > >
> > > > > > > > > Le mer. 13 juil. 2022 à 23:43, José Armando García Sancio
> > > > > > > > >  a écrit :
> > > > > > > > >
> > > > > > > > > > +1. Thanks for volunteering David.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -José
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > David Arthur
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> David Arthur
>


Re: [VOTE] 3.2.1 RC3

2022-07-22 Thread Christopher Shannon
+1 (non binding)

I built from source and ran through some of the tests including all the
connect runtime tests. I verified that KAFKA-14079 was included and the fix
looked good in my tests.

On Thu, Jul 21, 2022 at 9:15 PM David Arthur  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first release candidate of Apache Kafka 3.2.1.
>
> This is a bugfix release with several fixes since the release of 3.2.0. A
> few of the major issues include:
>
> * KAFKA-14062 OAuth client token refresh fails with SASL extensions
> * KAFKA-14079 Memory leak in connectors using errors.tolerance=all
> * KAFKA-14024 Cooperative rebalance regression causing clients to get stuck
>
>
> Release notes for the 3.2.1 release:
> https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/RELEASE_NOTES.html
>
>
>
>  Please download, test and vote by Wednesday July 27, 2022 at 17:00 PT.
> 
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/
>
> Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> Javadoc: https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/javadoc/
>
> Tag to be voted upon (off 3.2 branch) is the 3.2.1 tag:
> https://github.com/apache/kafka/releases/tag/3.2.1-rc3
>
> Documentation: https://kafka.apache.org/32/documentation.html
>
> Protocol: https://kafka.apache.org/32/protocol.html
>
>
> The past few builds have had flaky test failures. I will update this thread
> with passing build links soon.
>
> Unit/Integration test job:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.2/
> System test job:
> https://jenkins.confluent.io/job/system-test-kafka/job/3.2/
>
>
> Thanks!
> David Arthur
>


Re: [DISCUSS] Website changes required for Apache projects

2022-07-22 Thread Bill Bejeck
Hi Divij,

After thinking about the embedded videos some more I think it's probably
best for now to go with option 1 you presented above (text links to the
videos).
I will do a follow on PR for option #2 - creating an image placeholder that
will trigger the video once clicked.

Thanks again for driving this update effort.

-Bill

On Thu, Jul 21, 2022 at 5:25 PM Bill Bejeck  wrote:

> Hi All,
>
> I've filed an issue with INFRA (
> https://issues.apache.org/jira/browse/INFRA-23499) to ask about uploading
> the videos to the ASF YouTube channel, which would resolve the branding
> issue.
>
> Thanks,
> Bill
>
> On Thu, Jul 21, 2022 at 1:43 PM Bill Bejeck  wrote:
>
>> Hi Divij,
>>
>> First of all, let me say thanks for taking up this task.
>>
>> We seem to have two options:
>>> 1. Replace videos on the website with links to the videos OR
>>> 2. Take a placeholder image and use JS to trigger playback after the user
>>> clicks.
>>>
>>> I would suggest going with option#1 right now due to time constraints and
>>> create a ticket to do (more user friendly) option#2 in the future.* What
>>> do
>>> you think?*
>>>
>>
>> I'm inclined to go with option #2.
>>
>> But taking a look at the https://apache.org/ site, there's an embedded
>> video directly on the page, not an image or a link.
>>
>> So I'm wondering, since the video doesn't start playing right away and
>> requires a user to click to start it, that the "click image to start"
>> requirement is satisfied,
>>
>> as it aligns with what we see now on the Apache® Software Foundation page.
>>
>>
>> Regarding the branding, that's not in the video file itself but comes
>> from YouTube and the video's channel.
>>
>> I propose that we host the video on the Apache YouTube
>>  channel, and
>> that would take care of the branding issue.
>>
>>
>> What do you think?
>>
>>
>> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya 
>> wrote:
>>
>>> Thanks for chiming in with your opinions John/Mickael.
>>>
>>> The current set of videos are very helpful and removing them might be a
>>> disservice to our users. The most ideal solution would be to host the
>>> videos on Apache servers without any branding. Another less than ideal
>>> solution would be to host a repository of links to educational content on
>>> our website.
>>>
>>> As for the next steps, I am going to do the following which would help us
>>> get answers on whether solution 1 or solution 2 is more feasible. Please
>>> let me know if you think we need to do something different here.
>>> 1. Reach out to ASF legal and ask what permissions/licence would we
>>> require
>>> from the video owners to host the videos ourselves.
>>> 2. Reach out to ASF community mailing list
>>> <
>>> https://www.apache.org/foundation/mailinglists.html#foundation-community
>>> >
>>> and ask how other communities are hosting educational content.
>>>
>>> There is still an open question about how we decide what content gets
>>> added
>>> and what doesn't. I would propose that the model should be the same as
>>> accepting code changes i.e. it goes through a community review requiring
>>> votes committers/PMC members.
>>>
>>> Regards,
>>> Divij Vaidya
>>>
>>>
>>>
>>> On Thu, Jul 21, 2022 at 3:57 AM John Roesler 
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > Yes, thanks Divij for driving this!
>>> >
>>> > I tend to agree with Mickael about having vendor branding
>>> > front-and-center like that.
>>> >
>>> > On the other hand, I think the video itself is quite nice, and
>>> > it's a good thing to put in front of newcomers for a human
>>> > introduction to the project.
>>> >
>>> > I took a look at the video on those pages, and I'm not sure
>>> > if the videos themselves are branded. It looks like the branding
>>> > marks are markup that YouTube pastes on top of the video.
>>> >
>>> > Perhaps a solution is for Kafka to set up a channel of our own
>>> > and upload the videos there? Or maybe just host the videos
>>> > as static resources on our site directly? Approaches like those
>>> > are  probably good policy anyway, because then we
>>> > would control the content that shows on our site.
>>> >
>>> > Thanks,
>>> > John
>>> >
>>> > On Tue, Jul 19, 2022, at 11:48, Mickael Maison wrote:
>>> > > Hi Divij,
>>> > >
>>> > > Thanks for leading this work.
>>> > >
>>> > > To be honest I'm not sure what to do with the videos. I'm actually
>>> > > wondering if these videos should be on our website at all.
>>> > >
>>> > > My concerns is that they are branded. I find the content of the
>>> videos
>>> > > very good but I don't think we should include branded content from
>>> > > vendors on the Apache website, or at least not put it front and
>>> > > center. This is literally the first thing we show to newcomers,
>>> > > there's one at the top of both the Intro
>>> > > (https://kafka.apache.org/intro) and quickstart
>>> > > (https://kafka.apache.org/quickstart) pages.
>>> > >
>>> > > If tomorrow another vendor was to open a PR a

Re: [DISCUSS] Website changes required for Apache projects

2022-07-22 Thread Mickael Maison
Hi,

Don't get me wrong, the videos are great and it's definitively the
type of content we want on the website. We just got to be careful that
all content is vendor neutral. I'm not advocating for introducing new
policies or processes, I think the current PR process should be good
enough.

As noted, in this case the main issue comes from Youtube automatically
adding the channel branding to the videos. Also on the quickstart and
intro videos Tim says he's from Confluent. The intro he uses in the
Streams videos [0] is in my opinion preferable. If it's possible to
address this without some major editing, I think it would be worth
doing.

Thanks,
Mickael

0: https://kafka.apache.org/32/documentation/streams/

On Fri, Jul 22, 2022 at 4:22 PM Bill Bejeck  wrote:
>
> Hi Divij,
>
> After thinking about the embedded videos some more I think it's probably
> best for now to go with option 1 you presented above (text links to the
> videos).
> I will do a follow on PR for option #2 - creating an image placeholder that
> will trigger the video once clicked.
>
> Thanks again for driving this update effort.
>
> -Bill
>
> On Thu, Jul 21, 2022 at 5:25 PM Bill Bejeck  wrote:
>
> > Hi All,
> >
> > I've filed an issue with INFRA (
> > https://issues.apache.org/jira/browse/INFRA-23499) to ask about uploading
> > the videos to the ASF YouTube channel, which would resolve the branding
> > issue.
> >
> > Thanks,
> > Bill
> >
> > On Thu, Jul 21, 2022 at 1:43 PM Bill Bejeck  wrote:
> >
> >> Hi Divij,
> >>
> >> First of all, let me say thanks for taking up this task.
> >>
> >> We seem to have two options:
> >>> 1. Replace videos on the website with links to the videos OR
> >>> 2. Take a placeholder image and use JS to trigger playback after the user
> >>> clicks.
> >>>
> >>> I would suggest going with option#1 right now due to time constraints and
> >>> create a ticket to do (more user friendly) option#2 in the future.* What
> >>> do
> >>> you think?*
> >>>
> >>
> >> I'm inclined to go with option #2.
> >>
> >> But taking a look at the https://apache.org/ site, there's an embedded
> >> video directly on the page, not an image or a link.
> >>
> >> So I'm wondering, since the video doesn't start playing right away and
> >> requires a user to click to start it, that the "click image to start"
> >> requirement is satisfied,
> >>
> >> as it aligns with what we see now on the Apache® Software Foundation page.
> >>
> >>
> >> Regarding the branding, that's not in the video file itself but comes
> >> from YouTube and the video's channel.
> >>
> >> I propose that we host the video on the Apache YouTube
> >>  channel, and
> >> that would take care of the branding issue.
> >>
> >>
> >> What do you think?
> >>
> >>
> >> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya 
> >> wrote:
> >>
> >>> Thanks for chiming in with your opinions John/Mickael.
> >>>
> >>> The current set of videos are very helpful and removing them might be a
> >>> disservice to our users. The most ideal solution would be to host the
> >>> videos on Apache servers without any branding. Another less than ideal
> >>> solution would be to host a repository of links to educational content on
> >>> our website.
> >>>
> >>> As for the next steps, I am going to do the following which would help us
> >>> get answers on whether solution 1 or solution 2 is more feasible. Please
> >>> let me know if you think we need to do something different here.
> >>> 1. Reach out to ASF legal and ask what permissions/licence would we
> >>> require
> >>> from the video owners to host the videos ourselves.
> >>> 2. Reach out to ASF community mailing list
> >>> <
> >>> https://www.apache.org/foundation/mailinglists.html#foundation-community
> >>> >
> >>> and ask how other communities are hosting educational content.
> >>>
> >>> There is still an open question about how we decide what content gets
> >>> added
> >>> and what doesn't. I would propose that the model should be the same as
> >>> accepting code changes i.e. it goes through a community review requiring
> >>> votes committers/PMC members.
> >>>
> >>> Regards,
> >>> Divij Vaidya
> >>>
> >>>
> >>>
> >>> On Thu, Jul 21, 2022 at 3:57 AM John Roesler 
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > Yes, thanks Divij for driving this!
> >>> >
> >>> > I tend to agree with Mickael about having vendor branding
> >>> > front-and-center like that.
> >>> >
> >>> > On the other hand, I think the video itself is quite nice, and
> >>> > it's a good thing to put in front of newcomers for a human
> >>> > introduction to the project.
> >>> >
> >>> > I took a look at the video on those pages, and I'm not sure
> >>> > if the videos themselves are branded. It looks like the branding
> >>> > marks are markup that YouTube pastes on top of the video.
> >>> >
> >>> > Perhaps a solution is for Kafka to set up a channel of our own
> >>> > and upload the videos there? Or maybe just host the videos
> >>> > as static reso

Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-22 Thread José Armando García Sancio
Thanks Niket for your feedback. I have made changes to the KIP and
replied to your comments below.


Niket Goel wrote:
> > This UUID will be generated once and persisted as part of the quorum state 
> > for the topic partition
> Do we mean that it will be generated every time the disk on the replica is 
> primed (so every disk incarnation will have UUID). I think you describe it in 
> a section further below. Best to pull it up to here — “the replica uuid is 
> automatically generated once by the replica when persisting the quorum state 
> for the first time.”

Updated the Replica UUID section to better describe when it will be
generated and how it will be persisted.

> > If there are any pending voter change operations the leader will wait for 
> > them to finish.
> Will new requests be rejected or queued up behind the pending operation. (I 
> am assuming rejected, but want to confirm)

Either solution is correct but I think that the administrator would
prefer for the operation to get held until it can get processed or it
times out.

> > When this option is used the leader of the KRaft topic partition will not 
> > allow the AddVoter RPC to add replica IDs that are not describe in the 
> > configuration and it would not allow the RemoveVoter RPC to remove replica 
> > IDs that are described in the configuration.
> Bootstrapping is a little tricky I think. Is it safer/simpler to say that the 
> any add/remove RPC operations are blocked until all nodes in the config are 
> processed? The way it is worded above makes it seem like the leader will 
> accept adds of the same node from outside. Is that the case?

Updated the last sentence of that section to the following:

The KRaft leader will not perform writes from the state machine
(active controller) or client until is has written to the log an
AddVoterRecord for every replica id in the controller.quorum.voters
configuration.

>
> > The KRaft leader will not perform writes from the state machine (active 
> > controller) until is has written to the log an AddVoterRecord for every 
> > replica id in the controller.quorum.voters  configuration.
> Just thinking through - One of the safety requirements for the protocol is 
> for a leader to commit at least one write in an epoch before doing config 
> changes, right? In this special case we should be ok because the quorum has 
> no one but the leader in the beginning. Is that the thought process?

This should be safe because in this configuration the log will be
empty until all of the AddVoterRecords are persisted in a RecordBatch.
RecordBatches are atomic.

>
> > controller.quorum.bootstrap.servers vs controller.quorum.voters
> I understand the use of quorum.voters, but the bootstrap.servers is not 
> entirely clear to me. So in the example of starting the cluster with one 
> voter, will that one voter be listed here? And when using this configuration, 
> is the expectation that quorum.voters is empty, or will it eventually get 
> populated with the new quorum members?

These two configurations are mutually exclusive. The Kafka cluster is
expected to use one or the other. Kafka configuration validation will
fail if both are set. Kafka doesn't automatically update
configurations.

> e.g. further in the kip we say — “Replica 3 will discover the partition 
> leader using controller.quorum.voters”; so I guess it will be populated?

That example assumes that the cluster is configured to use
controller.quorum.voters: "Let's assume that the cluster is configured
to use  controller.quorum.voters and the value is
1@host1:9092,2@host2:9092,3@host3:9094."

>
> > This check will be removed and replicas will reply to votes request when 
> > the candidate is not in the voter set or the voting replica is not in the 
> > voter set.
> This is a major change IMO and I think it would be good if we could somehow 
> highlight it in the KIP to aid a future reader.

Hmm. All of the ideas and changes in the Proposed Changes section are
required and important for this feature to be correct and safe. The
Leader Election section highlights this change to the Vote RPC. The
Vote section later on in the document goes into more details.

> > This also means that the KRaft implementation needs to handle this 
> > uncommitted state getting truncated and reverted.
> Do we need to talk about the specific behavior a little more here? I mean how 
> does this affect any inflight messages with quorums moving between different 
> values. (Just a brief except to why it works)

I think that this requires going into how KafkaRaftClient is
implemented. I don't think we should do that in the KIP. I think that
it is better discussed during the implementation and PR review
process. The KIP and this section highlights that the implementation
needs to handle the voter set changing either because a log record was
read or because a log record was truncated.

> > This state can be discovered by a client by using the DescribeQuorum RPC, 
> > the Admin client or th

[GitHub] [kafka-site] mimaison merged pull request #420: KAFKA-13868: Self host fonts with project website

2022-07-22 Thread GitBox


mimaison merged PR #420:
URL: https://github.com/apache/kafka-site/pull/420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-22 Thread José Armando García Sancio
Jack Vanlightly wrote:
> - Regarding the removal of voters, when a leader appends a
> RemoveVoterRecord to its log, it immediately switches to the new
> configuration. There are two cases here:
> 1. The voter being removed is the leader itself. The KIP documents that
> the followers will continue to fetch from the leader despite these
> followers learning that the leader has been removed from the voter set.
> This is correct. It might be worth stating that the reason is that else the
> cluster can get blocked from making further progress as the followers must
> keep fetching from the leader else the leader cannot commit the record and
> resign.

Yes. The KIP now reads:
To allow this operation to be committed and for the leader to resign
the followers will continue to fetch from the leader even if the
leader is not part of the new voter set. In KRaft leader election is
triggered when the voter hasn't received a successful response in the
fetch timeout.

> 2. The voter being removed is not the leader. We should document that
> the leader will accept fetches with replicas ids (i.e. not observer
> fetches) from followers who are not in the voter set of the leader. This
> will occur because the voter being removed won't have received the
> RemoveVoteRecord yet and it must be allowed to reach this record so that:

In KRaft an observer is any replica with an (ID, UUID) or without an
(ID, UUID) that is not part of the voter set. I added a Key Terms
section with the definition of important terms that are used in the
KRaft implementation and KIPs. I'll add more terms to that section as
the discussion continues.

> - Regarding what happens when a voter leaves the voter set. When a
> non-leader voter reaches a RemoveVoterRecord where it is the subject of the
> removal, does it switch to being an observer and continue fetching? When a
> leader is removed, it carries on until it has committed the record or an
> election occurs electing a different leader. Until that point, it is
> leader but not a voter, so that makes it an observer? After it has
> committed the record and resigns (or gets deposed by an election) does it
> then start fetching as an observer?

Yeah. Good points. I was missing Handling sections for the
AddVoterRecord and RemoveVoterRecord sections. I added those sections
and they go into this detail. I should point out that observer is not
technically a state in KRaft. The better way to think about it is that
the voter set determines which states, specifically the candidate
state, a follower is allowed to transition to.

> - I think the Add/RemoveVoterRecords should also include the current voter
> set. This will make it easier to code against and also make
> troubleshooting easier. Else voter membership can only be reliably
> calculated by replaying the whole log.

Yeah. The reality of the implementation is that the replicas will have
to read the entire snapshot and log before they can determine the
voter set. I have concerns that by adding this field it will
unnecessarily complicate the snapshotting logic since it will have to
remember which AddVoterRecords were already appended to the snapshot.

I think we can make the topic partition snapshot and log more
debuggable by improving the kafka-metadata-shell. It is not part of
this KIP but I hope to write a future KIP that describes how the
kafka-metadata-shell displays information about the cluster metadata
partition.

> - Regarding the adding of voters:
> 1. We should document the problem of adding a new voter which then
> causes all progress to be blocked until the voter catches up with the
> leader. For example, in a 3 node cluster, we lose 1 node. We add a new node
> which means we have a majority = 3, with only 3 functioning nodes. Until
> the new node has caught up, the high watermark cannot advance. This can be
> solved by ensuring that to add a node we start it first as an observer and
> once it has caught up, send the AddVoter RPC to the leader. This leads to
> the question of how an observer determines that it has caught up.

Yes. I have the following in AddVoter Handling section:


--- Start of Section RPCs/AddVoter/Handling from KIP ---
When the leader receives an AddVoter request it will do the following:

1. Wait for the fetch offset of the replica (ID, UUID) to catch up to
the log end offset of the leader.
2. Wait for until there are no uncommitted add or remove voter records.
3. Append the AddVoterRecord to the log.
4. The KRaft internal listener will read this record from the log and
add the voter to the voter set.
5. Wait for the AddVoterRecord to commit using the majority of new
configuration.
6. Send the AddVoter response to the client.

In 1., the leader needs to wait for the replica to catch up because
when the AddVoterRecord is appended to the log the set of voter
changes. If the added voter is far behind then it can take some time
for it to reach the HWM. During this time the leader cannot commit
data and the quorum wil

[jira] [Created] (KAFKA-14098) Internal Kafka clients used by Kafka Connect should have distinguishable client IDs

2022-07-22 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-14098:
-

 Summary: Internal Kafka clients used by Kafka Connect should have 
distinguishable client IDs
 Key: KAFKA-14098
 URL: https://issues.apache.org/jira/browse/KAFKA-14098
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Chris Egerton
Assignee: Chris Egerton


KAFKA-5061 dealt with the lack of automatically-provided client IDs for the 
Kafka clients used for source and sink tasks, and has been addressed for some 
time now. Additionally, when new features have required new Kafka clients to be 
brought up for tasks (such as the need for an admin client to create topics for 
source tasks introduced by 
[KIP-158|https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics]),
 we have taken care to ensure that these clients are also given meaningful 
client IDs.

However, the internal clients used by Kafka Connect workers to create, consume 
from, and produce to internal topics do not have automatically-provided client 
IDs at the moment, and it is up to users to manually supply them. Worse yet, 
even if a user does manually supply a client ID for their Connect cluster's 
internal clients (by setting the {{client.id}} property in their worker 
configuration), there is no distinction made between the clients created for 
interacting with different topics.

If no {{client.id}} property is set in the worker config, Kafka Connect should 
automatically provide client IDs for its internal clients that includes the 
group ID of the cluster (if running in distributed mode) and the purpose of the 
client (such as {{{}statuses{}}}, {{{}configs{}}}, or {{{}offsets{}}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)