date:20240118

Re: [DISCUSS] KIP-971 Expose replication-offset-lag MirrorMaker2 metric

2024-01-18 Thread Mickael Maison

Hi Elxan,

Thanks for the updates.

We used dots to separate words in configuration names, so I think
replication.offset.lag.metric.last-replicated-offset.ttl should be
named replication.offset.lag.metric.last.replicated.offset.ttl
instead.

About the names of the metrics, fair enough if you prefer keeping the
replication prefix. Out of the alternatives you mentioned, I think I
prefer replication-record-lag. I think the metrics and configuration
names should match too. Let's see what the others think about it.

Thanks,
Mickael

On Mon, Jan 15, 2024 at 9:50 PM Elxan Eminov  wrote:
>
> Apologies, forgot to reply on your last comment about the metric name.
> I believe both replication-lag and record-lag are a little too abstract -
> what do you think about either leaving it as replication-offset-lag or
> renaming to replication-record-lag?
>
> Thanks
>
> On Wed, 10 Jan 2024 at 15:31, Mickael Maison 
> wrote:
>
> > Hi Elxan,
> >
> > Thanks for the KIP, it looks like a useful addition.
> >
> > Can you add to the KIP the default value you propose for
> > replication.lag.metric.refresh.interval? In MirrorMaker most interval
> > configs can be set to -1 to disable them, will it be the case for this
> > new feature or will this setting only accept positive values?
> > I also wonder if replication-lag, or record-lag would be clearer names
> > instead of replication-offset-lag, WDYT?
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Jan 3, 2024 at 6:15 PM Elxan Eminov 
> > wrote:
> > >
> > > Hi all,
> > > Here is the vote thread:
> > > https://lists.apache.org/thread/ftlnolcrh858dry89sjg06mdcdj9mrqv
> > >
> > > Cheers!
> > >
> > > On Wed, 27 Dec 2023 at 11:23, Elxan Eminov 
> > wrote:
> > >
> > > > Hi all,
> > > > I've updated the KIP with the details we discussed in this thread.
> > > > I'll call in a vote after the holidays if everything looks good.
> > > > Thanks!
> > > >
> > > > On Sat, 26 Aug 2023 at 15:49, Elxan Eminov 
> > > > wrote:
> > > >
> > > >> Relatively minor change with a new metric for MM2
> > > >>
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric
> > > >>
> > > >
> >

Re: [DISCUSS] KIP-1016 Make MM2 heartbeats topic name configurable

2024-01-18 Thread Kondrát Bertalan

Hi Viktor,

Let me address your points one by one.

   1. The current implementation does not support the source->target pair
   based configuration, it is global.
   2. Yes, I introduced that property both in the client and in the
   connectors
   3. This is a great idea, I am going to do that, and also I tried to
   construct the property name in a way that makes this clear for the users: '
   default.replication.policy.heartbeats.topic.name'
   4. Yeah, that was my impression too.

Thanks,
Berci

On Wed, Jan 17, 2024 at 4:51 PM Viktor Somogyi-Vass
 wrote:

> Hi Bertalan,
>
> Thanks for creating this KIP.
> A couple of observations/questions:
> 1. If I have multiple source->target pairs, can I set this property per
> cluster by prefixing with "source->target" as many other configs or is it
> global?
> 2. The replication policy must be set in MirrorClient as well. Is your
> change applicable to both MirrorClient and the connectors as well?
> 3. It might be worth pointing out (both in the docs and the KIP) that if
> the user overrides the replication policy to any other than
> DefaultReplicationPolicy, then this config has no effect.
> 4. With regards to integration tests, I tend to lean towards that we don't
> need them if we can cover this well with unit tests and mocking.
>
> Thanks,
> Viktor
>
> On Wed, Jan 17, 2024 at 12:23 AM Ryanne Dolan 
> wrote:
>
> > Makes sense to me, +1.
> >
> > On Tue, Jan 16, 2024 at 5:04 PM Kondrát Bertalan 
> > wrote:
> >
> >> Hey Team,
> >>
> >> I would like to start a discussion thread about the *KIP-1016 Make MM2
> >> heartbeats topic name configurable
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1016+Make+MM2+heartbeats+topic+name+configurable
> >> >*
> >> .
> >>
> >> This KIP aims to make the default heartbeat topic name (`heartbeats`) in
> >> the DefaultReplicationPolicy configurable via a property.
> >> Since this is my first KIP and the change is small, I implemented it in
> >> advance so, I can include the PR
> >>  as well.
> >>
> >> I appreciate all your feedbacks and comments.
> >>
> >> Special thanks to Viktor Somogyi-Vass  and
> >> Daniel
> >> Urban  for the original idea and help.
> >> Thank you,
> >> Berci
> >>
> >> --
> >> *Bertalan Kondrat* | Founder, SWE
> >> servy.hu 
> >>
> >>
> >>
> >> 
> >> --
> >>
> >
>


-- 
*Bertalan Kondrat* | Founder
t. +36(70) 413-4801
servy.hu 


[image: Servy] 
--

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.6 #139

2024-01-18 Thread Apache Jenkins Server

See

[jira] [Created] (KAFKA-16162) New created topics are unavailable after upgrading to 3.7

2024-01-18 Thread Luke Chen (Jira)

Luke Chen created KAFKA-16162:
-

 Summary: New created topics are unavailable after upgrading to 3.7
 Key: KAFKA-16162
 URL: https://issues.apache.org/jira/browse/KAFKA-16162
 Project: Kafka
  Issue Type: Bug
Reporter: Luke Chen


In 3.7, we introduced the KIP-858 JBOD feature, and the brokerRegistration 
request will include the `LogDirs` fields with UUID for each log dir in each 
broker. This info will be stored in the controller and used to identify if the 
log dir is known and online while handling AssignReplicasToDirsRequest 
[here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L2093].
 

While upgrading from old version, the kafka cluster will run in 3.7 binary with 
old metadata version, and then upgrade to newer version using 
kafka-features.sh. That means, while brokers startup and send the 
brokerRegistration request, it'll be using older metadata version without 
`LogDirs` fields included. And it makes the controller has no log dir info for 
all brokers. Later, after upgraded, if new topic is created, the flow will go 
like this:

1. Controller assign replicas and adds in metadata log
2. brokers fetch the metadata and apply it
3. ReplicaManager#maybeUpdateTopicAssignment will update topic assignment
4. After sending ASSIGN_REPLICAS_TO_DIRS to controller with replica assignment, 
controller will think the log dir in current replica is offline, so triggering 
offline handler, and reassign replica, and offline, until no more replicas to 
assign, so assigning leader to -1 (i.e. no leader) 

So, the results will be that new created topics are unavailable (with no 
leader) because the controller thinks all log dir are offline.

{code:java}
lukchen@lukchen-mac kafka % bin/kafka-topics.sh --describe --topic 
quickstart-events3 --bootstrap-server localhost:9092
  

Topic: quickstart-events3   TopicId: s8s6tEQyRvmjKI6ctNTgPg PartitionCount: 
3   ReplicationFactor: 3Configs: segment.bytes=1073741824
Topic: quickstart-events3   Partition: 0Leader: none
Replicas: 7,2,6 Isr: 6
Topic: quickstart-events3   Partition: 1Leader: none
Replicas: 2,6,7 Isr: 6
Topic: quickstart-events3   Partition: 2Leader: none
Replicas: 6,7,2 Isr: 6
{code}

The log snippet in the controller :


{code:java}
# handling 1st assignReplicaToDirs request

[2024-01-18 19:34:47,370] DEBUG [QuorumController id=1] Broker 6 assigned 
partition quickstart-events3:0 to OFFLINE dir 7K5JBERyyqFFxIXSXYluJA 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,370] DEBUG [QuorumController id=1] Broker 6 assigned 
partition quickstart-events3:2 to OFFLINE dir 7K5JBERyyqFFxIXSXYluJA 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,371] DEBUG [QuorumController id=1] Broker 6 assigned 
partition quickstart-events3:1 to OFFLINE dir 7K5JBERyyqFFxIXSXYluJA 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,372] DEBUG [QuorumController id=1] offline-dir-assignment: 
changing partition(s): quickstart-events3-0, quickstart-events3-2, 
quickstart-events3-1 (org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,372] DEBUG [QuorumController id=1] partition change for 
quickstart-events3-0 with topic ID 6ZIeidfiSTWRiOAmGEwn_g: directories: 
[AA, AA, AA] -> 
[7K5JBERyyqFFxIXSXYluJA, AA, AA], 
partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,372] DEBUG [QuorumController id=1] Replayed partition 
change PartitionChangeRecord(partitionId=0, topicId=6ZIeidfiSTWRiOAmGEwn_g, 
isr=null, leader=-2, replicas=null, removingReplicas=null, addingReplicas=null, 
leaderRecoveryState=-1, directories=[7K5JBERyyqFFxIXSXYluJA, 
AA, AA], eligibleLeaderReplicas=null, 
lastKnownELR=null) for topic quickstart-events3 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,372] DEBUG [QuorumController id=1] partition change for 
quickstart-events3-2 with topic ID 6ZIeidfiSTWRiOAmGEwn_g: directories: 
[AA, AA, AA] -> 
[AA, 7K5JBERyyqFFxIXSXYluJA, AA], 
partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
[2024-01-18 19:34:47,372] DEBUG [QuorumController id=1] Replayed partition 
change PartitionChangeRecord(partitionId=2, topicId=6ZIeidfiSTWRiOAmGEwn_g, 
isr=null, leader=-2, replicas=null, removingReplicas=null, addingReplicas=null, 
leaderRecoveryState=-1, directories=[AA, 
7K5JBERyyqFFxIXSXYluJA, AA], eligibleLeaderReplicas=null, 
l

Re: [VOTE] 3.7.0 RC2

2024-01-18 Thread Luke Chen

Hi all,

I think I've found another blocker issue: KAFKA-16162
 .
The impact is after upgrading to 3.7.0, any new created topics/partitions
will be unavailable.
I've put my findings in the JIRA.

Thanks.
Luke

On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax  wrote:

> Stan, thanks for driving this all forward! Excellent job.
>
> About
>
> > StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141
> > StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139
>
> For `StreamsUpgradeTest` it was a test setup issue and should be fixed
> now in trunk and 3.7 (and actually also in 3.6...)
>
> For `StreamsStandbyTask` the failing test exposes a regression bug, so
> it's a blocker. I updated the ticket accordingly. We already have an
> open PR that reverts the code introducing the regression.
>
>
> -Matthias
>
> On 1/17/24 9:44 AM, Proven Provenzano wrote:
> > We have another blocking issue for the RC :
> > https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar
> to
> > https://issues.apache.org/jira/browse/KAFKA-14616. The new issue however
> > can lead to the new topic having partitions that a producer cannot write
> to.
> >
> > --Proven
> >
> > On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano <
> pprovenz...@confluent.io>
> > wrote:
> >
> >>
> >> I have a PR https://github.com/apache/kafka/pull/15197 for
> >> https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
> >> --Proven
> >>
> >> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:
> >>
> >>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> >>> blocker bug because it *
> >>> *> will generate huge amount of logspam. I guess we didn't find it in
> >>> junit
> >>> tests *
> >>> *> since logspam doesn't fail the automated tests. But certainly it's
> not
> >>> suitable *
> >>> *> for production. Did you file a JIRA yet?*
> >>>
> >>> Hi Colin,
> >>>
> >>> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
> >>>
> >>> Thanks & Regards
> >>> Jakub
> >>>
> >>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe 
> wrote:
> >>>
>  Hi Stanislav,
> 
>  Thanks for making the first RC. The fact that it's titled RC2 is
> messing
>  with my mind a bit. I hope this doesn't make people think that we're
>  farther along than we are, heh.
> 
>  On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > *> Nice catch! It does seem like we should have gated this behind the
> > metadata> version as KIP-858 implies. Is the cluster configured with
> > multiple log> dirs? What is the impact of the error messages?*
> >
> > I did not observe any obvious impact. I was able to send and receive
> > messages as normally. But to be honest, I have no idea what else
> > this might impact, so I did not try anything special.
> >
> > I think everyone upgrading an existing KRaft cluster will go through
> >>> this
> > stage (running Kafka 3.7 with an older metadata version for at least
> a
> > while). So even if it is just a logged exception without any other
>  impact I
> > wonder if it might scare users from upgrading. But I leave it to
> >>> others
>  to
> > decide if this is a blocker or not.
> >
> 
>  Hi Jakub,
> 
>  Thanks for trying the RC. I think what you found is a blocker bug
> >>> because
>  it will generate huge amount of logspam. I guess we didn't find it in
> >>> junit
>  tests since logspam doesn't fail the automated tests. But certainly
> it's
>  not suitable for production. Did you file a JIRA yet?
> 
> > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> >  wrote:
> >
> >> Hey Luke,
> >>
> >> This is an interesting problem. Given the fact that the KIP for
> >>> having a
> >> 3.8 release passed, I think it weights the scale towards not calling
>  this a
> >> blocker and expecting it to be solved in 3.7.1.
> >>
> >> It is unfortunate that it would not seem safe to migrate to KRaft in
>  3.7.0
> >> (given the inability to rollback safely), but if that's true - the
> >>> same
> >> case would apply for 3.6.0. So in any case users w\ould be expected
> >>> to
>  use a
> >> patch release for this.
> 
>  Hi Luke,
> 
>  Thanks for testing rollback. I think this is a case where the
>  documentation is wrong. The intention was to for the steps to
> basically
> >>> be:
> 
>  1. roll all the brokers into zk mode, but with migration enabled
>  2. take down the kraft quorum
>  3. rmr /controller, allowing a hybrid broker to take over.
>  4. roll all the brokers into zk mode without migration enabled (if
> >>> desired)
> 
>  With these steps, there isn't really unavailability since a ZK
> >>> controller
>  can be elected quickly after the kraft quorum is gone.
> 
> >> Further, since we will have a 3.8 release -

Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2584

2024-01-18 Thread Apache Jenkins Server

See

Re: [VOTE] KIP-971: Expose replication-offset-lag MirrorMaker2 metric

2024-01-18 Thread Edoardo Comar

Hi Elxan,

+1 (binding).

Thanks,
Edo

On Wed, 10 Jan 2024 at 14:01, Viktor Somogyi-Vass
 wrote:
>
> Hi Elxan,
>
> +1 (binding).
>
> Thanks,
> Viktor
>
> On Mon, Jan 8, 2024 at 5:57 PM Dániel Urbán  wrote:
>
> > Hi Elxan,
> > +1 (non-binding)
> > Thanks for the KIP, this will be a very useful metric for MM!
> > Daniel
> >
> > Elxan Eminov  ezt írta (időpont: 2024. jan. 7.,
> > V,
> > 2:17):
> >
> > > Hi all,
> > > Bumping this for visibility
> > >
> > > On Wed, 3 Jan 2024 at 18:13, Elxan Eminov 
> > wrote:
> > >
> > > > Hi All,
> > > > I'd like to initiate a vote for KIP-971.
> > > > This KIP is about adding a new metric to the MirrorSourceTask that
> > tracks
> > > > the offset lag between a source and a target partition.
> > > >
> > > > KIP link:
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric
> > > >
> > > > Discussion thread:
> > > > https://lists.apache.org/thread/gwq9jd75dnm8htmpqkn17bnks6h3wqwp
> > > >
> > > > Thanks!
> > > >
> > >
> >

Re: DISCUSS KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-18 Thread Chris Egerton

Thanks Ziming, LGTM!

On Mon, Jan 15, 2024 at 12:00 AM ziming deng 
wrote:

> Hello Luke,
>
> thank you for finding this error, I have rectified it, and I will start a
> vote process soon.
>
> --
> Best,
> Ziming
>
>
> > On Jan 12, 2024, at 16:32, Luke Chen  wrote:
> >
> > Hi Ziming,
> >
> > Thanks for the KIP!
> > LGTM!
> > Using incremental by defaul and fallback automatically if it's not
> > supported is a good idea!
> >
> > One minor comment:
> > 1. so I'm inclined to move it to incrementalAlterConfigs  and "provide a
> > flag" to still use alterConfigs  for new client to interact with old
> > servers.
> > I don't think we will "provide any flag" after the discussion. We should
> > remove it.
> >
> > Thanks.
> > Luke
> >
> > On Fri, Jan 12, 2024 at 12:29 PM ziming deng  >
> > wrote:
> >
> >> Thank you for your clarification, Chris,
> >>
> >> I have spent some time to review KIP-894 and I think it's automatic way
> is
> >> better and bring no side effect, and I will also adopt this way here.
> >> As you mentioned, the changes in semantics is minor, the most important
> >> reason for this change is fixing bug brought by sensitive configs.
> >>
> >>
> >>> We
> >>> don't appear to support appending/subtracting from list properties via
> >> the
> >>> CLI for any other entity type right now,
> >> You are right about this, I tried and found that we can’t subtract or
> >> append configs, I will change the KIP to "making way for
> >> appending/subtracting list properties"
> >>
> >> --
> >> Best,
> >> Ziming
> >>
> >>> On Jan 6, 2024, at 01:34, Chris Egerton 
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Can we clarify any changes in the user-facing semantics for the CLI
> tool
> >>> that would come about as a result of this KIP? I think the debate over
> >> the
> >>> necessity of an opt-in flag, or waiting for 4.0.0, ultimately boils
> down
> >> to
> >>> this.
> >>>
> >>> My understanding is that the only changes in semantics are fairly minor
> >>> (semantic versioning pun intended):
> >>>
> >>> - Existing sensitive broker properties no longer have to be explicitly
> >>> specified on the command line if they're not being changed
> >>> - A small race condition is fixed where the broker config is updated
> by a
> >>> separate operation in between when the CLI reads the existing broker
> >> config
> >>> and writes the new broker config
> >>> - Usage of a new broker API that has been supported since version
> 2.3.0,
> >>> but which does not require any new ACLs and does not act any
> differently
> >>> apart from the two small changes noted above
> >>>
> >>> If this is correct, then I'm inclined to agree with Ismael's suggestion
> >> of
> >>> starting with incrementalAlterConfigs, and falling back on alterConfigs
> >> if
> >>> the former is not supported by the broker, and do not believe it's
> >>> necessary to wait for 4.0.0 or provide opt-in or opt-out flags to
> release
> >>> this change. This would also be similar to changes we made to
> >> MirrorMaker 2
> >>> in KIP-894 [1], where the default behavior for syncing topic configs is
> >> now
> >>> to start with incrementalAlterConfigs and fall back on alterConfigs if
> >> it's
> >>> not supported.
> >>>
> >>> If there are other, more significant changes to the user-facing
> semantics
> >>> for the CLI, then these should be called out here and in the KIP, and
> we
> >>> might consider a more cautious approach.
> >>>
> >>> [1] -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-894%3A+Use+incrementalAlterConfigs+API+for+syncing+topic+configurations
> >>>
> >>>
> >>> Also, regarding this part of the KIP:
> >>>
>  incrementalAlterConfigs is more convenient especially for updating
> >>> configs of list data type, such as
> >> "leader.replication.throttled.replicas"
> >>>
> >>> While this is true for the Java admin client and the corresponding
> broker
> >>> APIs, it doesn't appear to be relevant to the kafka-configs.sh CLI
> tool.
> >> We
> >>> don't appear to support appending/subtracting from list properties via
> >> the
> >>> CLI for any other entity type right now, and there's nothing in the KIP
> >>> that leads me to believe we'd be adding it for broker configs.
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>> On Thu, Jan 4, 2024 at 10:12 PM ziming deng  >> >
> >>> wrote:
> >>>
>  Hi Ismael,
>  I added this automatically approach to “Rejected alternatives”
> >> concerning
>  that we need to unify the semantics between alterConfigs and
>  incrementalAlterConfigs, so I choose to give this privilege to users.
> 
>  After reviewing these code and doing some tests I found that they
>  following the similar approach, I think the simplest way is to let the
>  client choose the best method heuristically.
> 
>  Thank you for pointing out this, I will change the KIP later.
> 
>  Best,
>  Ziming
> 
> > On Jan 4, 2024, at 17:28, Ismael

[jira] [Created] (KAFKA-16163) Constant resignation/reelection of controller when starting a single node in combined mode

2024-01-18 Thread Mickael Maison (Jira)

Mickael Maison created KAFKA-16163:
--

 Summary: Constant resignation/reelection of controller when 
starting a single node in combined mode
 Key: KAFKA-16163
 URL: https://issues.apache.org/jira/browse/KAFKA-16163
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Mickael Maison


When starting a single node in combined mode:
{noformat}
$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c 
config/kraft/server.properties
$ bin/kafka-server-start.sh config/kraft/server.properties{noformat}
 

it's constantly spamming the logs with:
{noformat}
[2024-01-18 17:37:09,065] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:11,967] INFO [RaftManager id=1] Did not receive fetch request 
from the majority of the voters within 3000ms. Current fetched voters are []. 
(org.apache.kafka.raft.LeaderState)
[2024-01-18 17:37:11,967] INFO [RaftManager id=1] Completed transition to 
ResignedState(localId=1, epoch=138, voters=[1], electionTimeoutMs=1864, 
unackedVoters=[], preferredSuccessors=[]) from Leader(localId=1, epoch=138, 
epochStartOffset=829, highWatermark=Optional[LogOffsetMetadata(offset=835, 
metadata=Optional[(segmentBaseOffset=0,relativePositionInSegment=62788)])], 
voterStates={1=ReplicaState(nodeId=1, 
endOffset=Optional[LogOffsetMetadata(offset=835, 
metadata=Optional[(segmentBaseOffset=0,relativePositionInSegment=62788)])], 
lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=true)}) 
(org.apache.kafka.raft.QuorumState)
[2024-01-18 17:37:13,072] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Client requested disconnect from node 1 
(org.apache.kafka.clients.NetworkClient)
[2024-01-18 17:37:13,072] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,123] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,124] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Client requested disconnect from node 1 
(org.apache.kafka.clients.NetworkClient)
[2024-01-18 17:37:13,124] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,175] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,176] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Client requested disconnect from node 1 
(org.apache.kafka.clients.NetworkClient)
[2024-01-18 17:37:13,176] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,227] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,229] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Client requested disconnect from node 1 
(org.apache.kafka.clients.NetworkClient)
[2024-01-18 17:37:13,229] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread)
[2024-01-18 17:37:13,279] INFO 
[broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
from now on will use node localhost:9093 (id: 1 rack: null) 
(kafka.server.NodeToControllerRequestThread){noformat}
This did not happen in 3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16164) Pre-Vote

2024-01-18 Thread Alyssa Huang (Jira)

Alyssa Huang created KAFKA-16164:


 Summary: Pre-Vote
 Key: KAFKA-16164
 URL: https://issues.apache.org/jira/browse/KAFKA-16164
 Project: Kafka
  Issue Type: Improvement
Reporter: Alyssa Huang


Implementing pre-vote as described in 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2585

2024-01-18 Thread Apache Jenkins Server

See

[jira] [Created] (KAFKA-16165) Consumer invalid transition on expired poll interval

2024-01-18 Thread Lianet Magrans (Jira)

Lianet Magrans created KAFKA-16165:
--

 Summary: Consumer invalid transition on expired poll interval
 Key: KAFKA-16165
 URL: https://issues.apache.org/jira/browse/KAFKA-16165
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Lianet Magrans
Assignee: Lianet Magrans


Running system tests with the new async consumer revealed an invalid transition 
related to the consumer not being polled on the interval in some kind of 
scenario (maybe relates to consumer close, as the transition is leaving->stale)

Log trace:

[2024-01-17 19:45:07,379] WARN [Consumer 
clientId=consumer.6aa7cd1c-c83f-47e1-8f8f-b38a459a05d8-0, 
groupId=consumer-groups-test-2] consumer poll timeout has expired. This means 
the time between subsequent calls to poll() was longer than the configured 
max.poll.interval.ms, which typically implies that the poll loop is spending 
too much time processing messages. You can address this either by increasing 
max.poll.interval.ms or by reducing the maximum size of batches returned in 
poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
[2024-01-17 19:45:07,379] ERROR [Consumer 
clientId=consumer.6aa7cd1c-c83f-47e1-8f8f-b38a459a05d8-0, 
groupId=consumer-groups-test-2] Unexpected error caught in consumer network 
thread (org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread:91)
java.lang.IllegalStateException: Invalid state transition from LEAVING to STALE
at 
org.apache.kafka.clients.consumer.internals.MembershipManagerImpl.transitionTo(MembershipManagerImpl.java:303)
at 
org.apache.kafka.clients.consumer.internals.MembershipManagerImpl.transitionToStale(MembershipManagerImpl.java:739)
at 
org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager.poll(HeartbeatRequestManager.java:194)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.lambda$runOnce$0(ConsumerNetworkThread.java:137)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at 
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:657)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.runOnce(ConsumerNetworkThread.java:139)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:88)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-15807) Add support for compression/decompression of metrics

2024-01-18 Thread Apoorv Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-15807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apoorv Mittal resolved KAFKA-15807.
---
Resolution: Done

> Add support for compression/decompression of metrics
> 
>
> Key: KAFKA-15807
> URL: https://issues.apache.org/jira/browse/KAFKA-15807
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apoorv Mittal
>Assignee: Apoorv Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

RE: Re: [DISCUSS] KIP-334 Include partitions in exceptions during consumer record deserialization/validation (Round 2)

2024-01-18 Thread Daniel Häuser

Voting for this KIP passed and there is also a pull request: 
https://github.com/apache/kafka/pull/7499


Is there any chance to get this merged?

Best regards
Daniel


On 2021/05/07 07:07:11 Sarwar Bhuiyan wrote:
> Jason, Colin, and Mathias, do you have any comments on the KIP as it
> stands now?
>
> Thank you.
>
> Sarwar
>
> On Fri, 23 Apr 2021 at 10:12, Sarwar Bhuiyan  wrote:
>
> > Hi all,
> >
> > I'm reviving this discussion after perusing the vote thread created by
> > Stan. I believe the KIP is already revised to include Jason and Colin's
> > suggestion of using a RecordDeserializationException. It signals to the
> > user that something needs to be done (whether skipping the record 
as shown
> > in the example and/or do something else). This is backward 
compatible with

> > what we have today and is clean.
> >
> > Mathias suggested that we split the exception types into a
> > RecordDeserializationException and a CorruptRecordException with the
> > intention of auto-skipping corrupt records. I would prefer to use 
another

> > KIP for that type of functionality.
> >
> > KIP:
> > 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87297793

> >
> > JIRAs:
> > https://issues.apache.org/jira/browse/KAFKA-5682
> >
> > Last Vote thread comment from Stan:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg92392.html
> >
> > On another note, I need edit access to the KIP page. Could somebody 
give

> > me access should I need to make changes? My username is sarwarb.
> >
> > Thanks.
> >
> > Sarwar
> >
> >
> >
> > --
>
>
> [image: Confluent] 
> Sarwar Bhuiyan
> Staff Customer Success Technical Architect
> +447949684437
> Follow us: [image: Blog]
> 
[image:

> Twitter] [image: LinkedIn]
> [image: Slack]
> [image: YouTube]
> 
> [image: Kafka Summit] 
>


smime.p7s
Description: Kryptografische S/MIME-Signatur

[jira] [Resolved] (KAFKA-16087) Tasks dropping incorrect records when errors.tolerance=all and errors reported asynchronously due to data race

2024-01-18 Thread Greg Harris (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-16087.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Tasks dropping incorrect records when errors.tolerance=all and errors 
> reported asynchronously due to data race
> --
>
> Key: KAFKA-16087
> URL: https://issues.apache.org/jira/browse/KAFKA-16087
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.2.0, 3.7.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.8.0
>
>
> The ErrantRecordReporter introduced in KIP-610 (2.6.0) allows sink connectors 
> to push records to the connector DLQ topic. The implementation of this 
> reporter interacts with the ProcessingContext within the per-task 
> RetryWithToleranceOperator. The ProcessingContext stores mutable state about 
> the current operation, such as what error has occurred or what record is 
> being operated on.
> The ProcessingContext and RetryWithToleranceOperator is also used by the 
> converter and transformation pipeline of the connector for similar reasons. 
> When the ErrantRecordReporter#report function is called from SinkTask#put, 
> there is no contention over the mutable state, as the thread used for 
> SinkTask#put is also responsible for converting and transforming the record. 
> However, if ErrantRecordReporter#report is called by an extra thread within 
> the SinkTask, there is thread contention on the single mutable 
> ProcessingContext.
> This was noticed in https://issues.apache.org/jira/browse/KAFKA-10602 and the 
> synchronized keyword was added to all methods of RetryWithToleranceOperator 
> which interact with the ProcessingContext. However, this solution still 
> allows the RWTO methods to interleave, and produce unintended data races. 
> Consider the following interleaving:
> 1. Thread 1 converts and transforms record A successfully.
> 2. Thread 1 calls SinkTask#put(A) and delivers the message to the task.
> 3. Thread 1 queues some other thread 2 with some delay to call 
> ErrantRecordReporter#report(A).
> 4. Thread 1 returns from SinkTask#put and polls record B from the consumer.
> 5. Thread 1 calls RWTO#execute for a converter or transformation operation. 
> For example, [converting 
> headers|https://github.com/apache/kafka/blob/c0b649345580e4dfb2ebb88d3aaace71afe70d75/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L539]
> 6. The operation succeeds, and the ProcessingContext is left with error == 
> null, or equivalently failed() == false.
> 7. Thread 2 has it's delay expire, and it calls ErrantRecordReporter#report.
> 8. Thread 2 uses the WorkerErrantRecordReporter implementation, which calls 
> [RWTO 
> executeFailed|https://github.com/apache/kafka/blob/c0b649345580e4dfb2ebb88d3aaace71afe70d75/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/errors/WorkerErrantRecordReporter.java#L109]
>  and returns.
> 9. The operation leaves ProcessingContext with error != null, or equivalently 
> failed() == true.
> 10. Thread 1 then resumes execution, and calls [RWTO 
> failed|https://github.com/apache/kafka/blob/c0b649345580e4dfb2ebb88d3aaace71afe70d75/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L541]
>  which evaluates to true.
> 11. Thread 1 then drops Record B, even though the header conversion succeeded 
> without error.
> 12. Record B is never delivered to the Sink Task, and never delivered to the 
> error reporter for processing, despite having produced no error during 
> processing.
> This per-method synchronization for returning nulls and errors separately is 
> insufficient, and either the data sharing should be avoided or a different 
> locking mechanism should be used.
> A similar flaw exists in source connectors and asynchronous errors reported 
> by the producer, and was introduced in KIP-779 (3.2.0)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16166) Generify RetryWithToleranceOperator and ErrorReporter

2024-01-18 Thread Greg Harris (Jira)

Greg Harris created KAFKA-16166:
---

 Summary: Generify RetryWithToleranceOperator and ErrorReporter
 Key: KAFKA-16166
 URL: https://issues.apache.org/jira/browse/KAFKA-16166
 Project: Kafka
  Issue Type: Improvement
  Components: connect
Reporter: Greg Harris
Assignee: Greg Harris


The RetryWithToleranceOperator and ErrorReporter instances in connect are only 
ever used with a single type of ProcessingContext 
(ProcessingContext for sources, 
ProcessingContext> for sinks) and currently 
dynamically decide between these with instanceof checks.

Instead, these classes should be generic, and have their implementations accept 
consistent ProcessingContext objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-18 Thread Jun Rao

Hi, Justine,

I don't see this create any issue. It just makes it a bit hard to explain
what this non-tagged produce id field means. We are essentially trying to
combine two actions (completing a txn and init a new produce Id) in a
single record. But, this may be fine too.

A few other follow up comments.

101.3 I guess the reason that we only set the previous produce id tagged
field in the complete marker, but not in the prepare marker, is that in the
prepare state, we always return CONCURRENT_TRANSACTIONS on retried endMaker
requests?

110. "I believe your second point is mentioned in the KIP. I can add more
text on
this if it is helpful.
> The delayed message case can also violate EOS if the delayed message
comes in after the next addPartitionsToTxn request comes in. Effectively we
may see a message from a previous (aborted) transaction become part of the
next transaction."

The above is the case when a delayed message is appended to the data
partition. What I mentioned is a slightly different case when a delayed
marker is appended to the transaction log partition.

111. The KIP says "Once we move past the Prepare and Complete states, we
don’t need to worry about lastSeen fields and clear them, just handle state
transitions as normal.". Is the lastSeen field the same as the previous
Produce Id tagged field in TransactionLogValue?

112. Since the kip changes the inter-broker protocol, should we bump up the
MV/IBP version? Is this feature only for the KRaft mode?

Thanks,

Jun


On Wed, Jan 17, 2024 at 11:13 AM Justine Olshan
 wrote:

> Hey Jun,
>
> I'm glad we are getting to convergence on the design. :)
>
> While I understand it seems a little "weird". I'm not sure what the benefit
> of writing an extra record to the log.
> Is the concern a tool to describe transactions won't work (ie, the complete
> state is needed to calculate the time since the transaction completed?)
> If we have a reason like this, it is enough to convince me we need such an
> extra record. It seems like it would be replacing the record written on
> InitProducerId. Is this correct?
>
> Thanks,
> Justine
>
> On Tue, Jan 16, 2024 at 5:14 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the explanation. I understand the intention now. In the
> overflow
> > case, we set the non-tagged field to the old pid (and the max epoch) in
> the
> > prepare marker so that we could correctly write the marker to the data
> > partition if the broker downgrades. When writing the complete marker, we
> > know the marker has already been written to the data partition. We set
> the
> > non-tagged field to the new pid to avoid InvalidPidMappingException in
> the
> > client if the broker downgrades.
> >
> > The above seems to work. It's just a bit inconsistent for a prepare
> marker
> > and a complete marker to use different pids in this special case. If we
> > downgrade with the complete marker, it seems that we will never be able
> to
> > write the complete marker with the old pid. Not sure if it causes any
> > issue, but it seems a bit weird. Instead of writing the complete marker
> > with the new pid, could we write two records: a complete marker with the
> > old pid followed by a TransactionLogValue with the new pid and an empty
> > state? We could make the two records in the same batch so that they will
> be
> > added to the log atomically.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jan 12, 2024 at 5:40 PM Justine Olshan
> > 
> > wrote:
> >
> > > (1) the prepare marker is written, but the endTxn response is not
> > received
> > > by the client when the server downgrades
> > > (2)  the prepare marker is written, the endTxn response is received by
> > the
> > > client when the server downgrades.
> > >
> > > I think I am still a little confused. In both of these cases, the
> > > transaction log has the old producer ID. We don't write the new
> producer
> > ID
> > > in the prepare marker's non tagged fields.
> > > If the server downgrades now, it would read the records not in tagged
> > > fields and the complete marker will also have the old producer ID.
> > > (If we had used the new producer ID, we would not have transactional
> > > correctness since the producer id doesn't match the transaction and the
> > > state would not be correct on the data partition.)
> > >
> > > In the overflow case, I'd expect the following to happen on the client
> > side
> > > Case 1  -- we retry EndTxn -- it is the same producer ID and epoch - 1
> > this
> > > would fence the producer
> > > Case 2 -- we don't retry EndTxn and use the new producer id which would
> > > result in InvalidPidMappingException
> > >
> > > Maybe we can have special handling for when a server downgrades. When
> it
> > > reconnects we could get an API version request showing KIP-890 part 2
> is
> > > not supported. In that case, we can call initProducerId to abort the
> > > transaction. (In the overflow case, this correctly gives us a new
> > producer
> > > ID)
> > >
> > > I guess the cor

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2586

2024-01-18 Thread Apache Jenkins Server

See

[jira] [Created] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup

2024-01-18 Thread Kirk True (Jira)

Kirk True created KAFKA-16167:
-

 Summary: Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
 Key: KAFKA-16167
 URL: https://issues.apache.org/jira/browse/KAFKA-16167
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-18 Thread Justine Olshan

Hey Jun,

101.3 We don't set the previous ID in the Prepare field since we don't need
it. It is the same producer ID as the main producer ID field.

110 Hmm -- maybe I need to reread your message about delayed markers. If we
receive a delayed endTxn marker after the transaction is already complete?
So we will commit the next transaction early without the fixes in part 2?

111 Yes -- this terminology was used in a previous KIP and never
implemented it in the log -- only in memory

112 Hmm -- which interbroker protocol are you referring to? I am working on
the design for the work to remove the extra add partitions call and I right
now the design bumps MV. I have yet to update that section as I finalize
the design so please stay tuned. Was there anything else you thought needed
MV bump?

Justine

On Thu, Jan 18, 2024 at 3:07 PM Jun Rao  wrote:

> Hi, Justine,
>
> I don't see this create any issue. It just makes it a bit hard to explain
> what this non-tagged produce id field means. We are essentially trying to
> combine two actions (completing a txn and init a new produce Id) in a
> single record. But, this may be fine too.
>
> A few other follow up comments.
>
> 101.3 I guess the reason that we only set the previous produce id tagged
> field in the complete marker, but not in the prepare marker, is that in the
> prepare state, we always return CONCURRENT_TRANSACTIONS on retried endMaker
> requests?
>
> 110. "I believe your second point is mentioned in the KIP. I can add more
> text on
> this if it is helpful.
> > The delayed message case can also violate EOS if the delayed message
> comes in after the next addPartitionsToTxn request comes in. Effectively we
> may see a message from a previous (aborted) transaction become part of the
> next transaction."
>
> The above is the case when a delayed message is appended to the data
> partition. What I mentioned is a slightly different case when a delayed
> marker is appended to the transaction log partition.
>
> 111. The KIP says "Once we move past the Prepare and Complete states, we
> don’t need to worry about lastSeen fields and clear them, just handle state
> transitions as normal.". Is the lastSeen field the same as the previous
> Produce Id tagged field in TransactionLogValue?
>
> 112. Since the kip changes the inter-broker protocol, should we bump up the
> MV/IBP version? Is this feature only for the KRaft mode?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 17, 2024 at 11:13 AM Justine Olshan
>  wrote:
>
> > Hey Jun,
> >
> > I'm glad we are getting to convergence on the design. :)
> >
> > While I understand it seems a little "weird". I'm not sure what the
> benefit
> > of writing an extra record to the log.
> > Is the concern a tool to describe transactions won't work (ie, the
> complete
> > state is needed to calculate the time since the transaction completed?)
> > If we have a reason like this, it is enough to convince me we need such
> an
> > extra record. It seems like it would be replacing the record written on
> > InitProducerId. Is this correct?
> >
> > Thanks,
> > Justine
> >
> > On Tue, Jan 16, 2024 at 5:14 PM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the explanation. I understand the intention now. In the
> > overflow
> > > case, we set the non-tagged field to the old pid (and the max epoch) in
> > the
> > > prepare marker so that we could correctly write the marker to the data
> > > partition if the broker downgrades. When writing the complete marker,
> we
> > > know the marker has already been written to the data partition. We set
> > the
> > > non-tagged field to the new pid to avoid InvalidPidMappingException in
> > the
> > > client if the broker downgrades.
> > >
> > > The above seems to work. It's just a bit inconsistent for a prepare
> > marker
> > > and a complete marker to use different pids in this special case. If we
> > > downgrade with the complete marker, it seems that we will never be able
> > to
> > > write the complete marker with the old pid. Not sure if it causes any
> > > issue, but it seems a bit weird. Instead of writing the complete marker
> > > with the new pid, could we write two records: a complete marker with
> the
> > > old pid followed by a TransactionLogValue with the new pid and an empty
> > > state? We could make the two records in the same batch so that they
> will
> > be
> > > added to the log atomically.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Jan 12, 2024 at 5:40 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > (1) the prepare marker is written, but the endTxn response is not
> > > received
> > > > by the client when the server downgrades
> > > > (2)  the prepare marker is written, the endTxn response is received
> by
> > > the
> > > > client when the server downgrades.
> > > >
> > > > I think I am still a little confused. In both of these cases, the
> > > > transaction log has the old producer ID. We don't write the new
> > producer
> > > ID
> > > > in the prepare marker's n

[jira] [Resolved] (KAFKA-16163) Constant resignation/reelection of controller when starting a single node in combined mode

2024-01-18 Thread Luke Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16163.
---
Resolution: Duplicate

> Constant resignation/reelection of controller when starting a single node in 
> combined mode
> --
>
> Key: KAFKA-16163
> URL: https://issues.apache.org/jira/browse/KAFKA-16163
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Mickael Maison
>Priority: Major
>
> When starting a single node in combined mode:
> {noformat}
> $ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
> $ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c 
> config/kraft/server.properties
> $ bin/kafka-server-start.sh config/kraft/server.properties{noformat}
>  
> it's constantly spamming the logs with:
> {noformat}
> [2024-01-18 17:37:09,065] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:11,967] INFO [RaftManager id=1] Did not receive fetch 
> request from the majority of the voters within 3000ms. Current fetched voters 
> are []. (org.apache.kafka.raft.LeaderState)
> [2024-01-18 17:37:11,967] INFO [RaftManager id=1] Completed transition to 
> ResignedState(localId=1, epoch=138, voters=[1], electionTimeoutMs=1864, 
> unackedVoters=[], preferredSuccessors=[]) from Leader(localId=1, epoch=138, 
> epochStartOffset=829, highWatermark=Optional[LogOffsetMetadata(offset=835, 
> metadata=Optional[(segmentBaseOffset=0,relativePositionInSegment=62788)])], 
> voterStates={1=ReplicaState(nodeId=1, 
> endOffset=Optional[LogOffsetMetadata(offset=835, 
> metadata=Optional[(segmentBaseOffset=0,relativePositionInSegment=62788)])], 
> lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, 
> hasAcknowledgedLeader=true)}) (org.apache.kafka.raft.QuorumState)
> [2024-01-18 17:37:13,072] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Client requested disconnect from node 1 
> (org.apache.kafka.clients.NetworkClient)
> [2024-01-18 17:37:13,072] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,123] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,124] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Client requested disconnect from node 1 
> (org.apache.kafka.clients.NetworkClient)
> [2024-01-18 17:37:13,124] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,175] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,176] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Client requested disconnect from node 1 
> (org.apache.kafka.clients.NetworkClient)
> [2024-01-18 17:37:13,176] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,227] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,229] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Client requested disconnect from node 1 
> (org.apache.kafka.clients.NetworkClient)
> [2024-01-18 17:37:13,229] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread)
> [2024-01-18 17:37:13,279] INFO 
> [broker-1-to-controller-heartbeat-channel-manager]: Recorded new controller, 
> from now on will use node localhost:9093 (id: 1 rack: null) 
> (kafka.server.NodeToControllerRequestThread){noformat}
> This did not happen in 3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2587

2024-01-18 Thread Apache Jenkins Server

See

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #69

2024-01-18 Thread Apache Jenkins Server

See

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2588

2024-01-18 Thread Apache Jenkins Server

See

Re: [DISCUSS] KIP-971 Expose replication-offset-lag MirrorMaker2 metric

Re: [DISCUSS] KIP-1016 Make MM2 heartbeats topic name configurable

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.6 #139

[jira] [Created] (KAFKA-16162) New created topics are unavailable after upgrading to 3.7

Re: [VOTE] 3.7.0 RC2

Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2584

Re: [VOTE] KIP-971: Expose replication-offset-lag MirrorMaker2 metric

Re: DISCUSS KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

[jira] [Created] (KAFKA-16163) Constant resignation/reelection of controller when starting a single node in combined mode

[jira] [Created] (KAFKA-16164) Pre-Vote

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2585

[jira] [Created] (KAFKA-16165) Consumer invalid transition on expired poll interval

[jira] [Resolved] (KAFKA-15807) Add support for compression/decompression of metrics

RE: Re: [DISCUSS] KIP-334 Include partitions in exceptions during consumer record deserialization/validation (Round 2)

[jira] [Resolved] (KAFKA-16087) Tasks dropping incorrect records when errors.tolerance=all and errors reported asynchronously due to data race

[jira] [Created] (KAFKA-16166) Generify RetryWithToleranceOperator and ErrorReporter

Re: [DISCUSS] KIP-890 Server Side Defense

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2586

[jira] [Created] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup

Re: [DISCUSS] KIP-890 Server Side Defense

[jira] [Resolved] (KAFKA-16163) Constant resignation/reelection of controller when starting a single node in combined mode

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2587

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #69

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2588

24 matches

Site Navigation

Mail list logo

Footer information