gerlowskija commented on code in PR #4171:
URL: https://github.com/apache/solr/pull/4171#discussion_r2960614143
##########
solr/core/src/test/org/apache/solr/handler/admin/api/NodeHealthTest.java:
##########
@@ -28,30 +28,21 @@
import org.apache.solr.common.SolrException;
import org.apache.solr.common.SolrException.ErrorCode;
import org.apache.solr.embedded.JettySolrRunner;
-import org.apache.solr.util.SolrJettyTestRule;
import org.junit.BeforeClass;
-import org.junit.ClassRule;
import org.junit.Test;
public class NodeHealthTest extends SolrCloudTestCase {
- /**
Review Comment:
[-0] I love that this test is now SolrCloud only, but should it's name be
changed to indicate that. And maybe it should have Javadocs that link to its
standalone counterpart.
e.g.
```
/**
* Tests for the node-health API, on SolrCloud clusters
*
* @see NodeHealthStandaloneTest
*/
public class NodeHealthSolrCloudTest extends SolrCloudTestCase {
```
##########
solr/core/src/test/org/apache/solr/handler/admin/api/NodeHealthTest.java:
##########
@@ -85,9 +77,10 @@ public void testCloudMode_UnhealthyWhenZkClientClosed()
throws Exception {
SolrException e =
assertThrows(SolrException.class, () -> new
NodeApi.Healthcheck().process(nodeClient));
assertEquals(ErrorCode.SERVICE_UNAVAILABLE.code, e.code());
- assertTrue(
+ assertThat(
Review Comment:
[0] If you're taking this Hamcrest approach, you don't even need the first
argument: `"Expected 'host Unavailable' in exception message"`.
In fact, the default message Hamcrest will print on failure is **better**
than the override you've specified here since the default message will include
the actual message value.
##########
solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-index-replication.adoc:
##########
@@ -575,6 +575,63 @@ A snapshot with the name `snapshot._name_` must exist or
an error will be return
`location`::: The location where the snapshot is created.
+[[monitoring-follower-replication-lag]]
+== Monitoring Follower Replication Lag
+
+In a leader-follower deployment it is important to know whether followers are
keeping pace with the leader.
+Solr's health-check endpoint supports a `maxGenerationLag` request parameter
that lets you assert that each follower core is within a specified number of
Lucene commit generations of its leader.
+When the follower is lagging more than the allowed number of generations the
endpoint returns HTTP 503 (Service Unavailable), making it straightforward to
integrate into load-balancer health probes or monitoring systems.
+
+The `maxGenerationLag` parameter is an integer representing the maximum
acceptable number of commit generations by which a follower is allowed to trail
its leader.
+A value of `0` requires the follower to be fully up to date.
+If the parameter is omitted, the health check returns `OK` regardless of
replication lag.
+
+[WARNING]
+====
+Because a follower's generation can only increase when a replication from the
leader actually completes, `maxGenerationLag=0` may return `FAILURE`
immediately after a follower starts or after a period of network instability
even though the follower will catch up on the next poll cycle.
Review Comment:
[Q] Did this information come from somewhere? Or is it AI generated? (And
if the latter - is it something you're familiar with and can confirm is
correct?)
It all sounds very plausible, and if it's correct then the docs are
excellent. But I wanted to highlight that I can't validate its correctness as
a reviewer because I don't have a lot of experience deploying leader/follower
personally.
##########
changelog/unreleased/SOLR-16458-migrate-node-health-api.yml:
##########
@@ -1,7 +1,8 @@
-title: "SolrJ now offers a SolrRequest class allowing users to perform
single-node healthchecks: NodeApi.Healthcheck"
+title: "SolrJ now offers a SolrRequest class allowing users to perform v2
single-node healthchecks: NodeApi.Healthcheck"
type: added
authors:
- name: Eric Pugh
+ - name: Jason Gerlowski
Review Comment:
[Q] (General question; not PR specific)
Is it our convention to add reviewers here as well?
I'm all for that but it'd be great to document in `dev-docs/changelog.adoc`
if that's a convention we'd like to cement!
##########
solr/api/src/java/org/apache/solr/client/api/endpoint/NodeHealthApi.java:
##########
@@ -30,5 +31,14 @@ public interface NodeHealthApi {
@Operation(
summary = "Determine the health of a Solr node.",
tags = {"node"})
- NodeHealthResponse healthcheck(@QueryParam("requireHealthyCores") Boolean
requireHealthyCores);
+ NodeHealthResponse healthcheck(
+ @QueryParam("requireHealthyCores") Boolean requireHealthyCores,
+ @Parameter(
+ description =
+ "Maximum number of index generations a follower replica may
lag behind its"
+ + " leader before the health check reports FAILURE. Only
relevant when"
+ + " running in legacy (non-SolrCloud) mode with
leader/follower"
Review Comment:
[0] I've seen you remove "legacy" a few other places based on an earlier
comment, but I think this one got missed.
I don't mind that language personally, but I don't think the community as a
whole agrees on that yet and there's some probably some broader
consensus-building needed before we start calling leader/follower "legacy"
##########
solr/core/src/java/org/apache/solr/handler/admin/api/NodeHealth.java:
##########
@@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.handler.admin.api;
+
+import static
org.apache.solr.client.api.model.NodeHealthResponse.NodeStatus.FAILURE;
+import static
org.apache.solr.client.api.model.NodeHealthResponse.NodeStatus.OK;
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+import static
org.apache.solr.common.SolrException.ErrorCode.SERVICE_UNAVAILABLE;
+import static org.apache.solr.handler.admin.api.ReplicationAPIBase.GENERATION;
+import static org.apache.solr.security.PermissionNameProvider.Name.HEALTH_PERM;
+
+import jakarta.inject.Inject;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+import org.apache.lucene.index.IndexCommit;
+import org.apache.solr.api.JerseyResource;
+import org.apache.solr.client.api.endpoint.NodeHealthApi;
+import org.apache.solr.client.api.model.NodeHealthResponse;
+import org.apache.solr.cloud.CloudDescriptor;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.cloud.Replica.State;
+import org.apache.solr.common.cloud.ZkStateReader;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.CoreContainer;
+import org.apache.solr.core.CoreDescriptor;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.handler.IndexFetcher;
+import org.apache.solr.handler.ReplicationHandler;
+import org.apache.solr.jersey.PermissionName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * V2 API for checking the health of the receiving node.
+ *
+ * <p>This API (GET /v2/node/health) is analogous to the v1 /admin/info/health.
+ *
+ * <p>The v1 {@link org.apache.solr.handler.admin.HealthCheckHandler}
delegates to this class.
+ */
+public class NodeHealth extends JerseyResource implements NodeHealthApi {
+
+ private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+ private static final List<State> UNHEALTHY_STATES =
Arrays.asList(State.DOWN, State.RECOVERING);
+
+ private final CoreContainer coreContainer;
+
+ @Inject
+ public NodeHealth(CoreContainer coreContainer) {
+ this.coreContainer = coreContainer;
+ }
+
+ @Override
+ @PermissionName(HEALTH_PERM)
+ public NodeHealthResponse healthcheck(Boolean requireHealthyCores) {
+ return checkNodeHealth(requireHealthyCores, null);
+ }
+
+ /**
+ * Performs the node health check and returns the result as a {@link
NodeHealthResponse}.
+ *
+ * <p>This overload is used by the v1 {@link
+ * org.apache.solr.handler.admin.HealthCheckHandler#handleRequestBody} path,
which can supply the
+ * legacy {@code maxGenerationLag} parameter that is not exposed via the v2
endpoint.
+ */
+ public NodeHealthResponse checkNodeHealth(Boolean requireHealthyCores,
Integer maxGenerationLag) {
+ if (coreContainer == null || coreContainer.isShutDown()) {
+ throw new SolrException(
+ SERVER_ERROR, "CoreContainer is either not initialized or shutting
down");
+ }
+
+ final NodeHealthResponse response =
instantiateJerseyResponse(NodeHealthResponse.class);
+
+ if (!coreContainer.isZooKeeperAware()) {
+ if (log.isDebugEnabled()) {
+ log.debug("Invoked HealthCheckHandler in legacy mode.");
+ }
+ healthCheckLegacyMode(response, maxGenerationLag);
+ } else {
+ if (log.isDebugEnabled()) {
+ log.debug(
+ "Invoked HealthCheckHandler in cloud mode on [{}]",
+ coreContainer.getZkController().getNodeName());
+ }
+ healthCheckCloudMode(response, requireHealthyCores);
+ }
+
+ return response;
+ }
+
+ private void healthCheckCloudMode(NodeHealthResponse response, Boolean
requireHealthyCores) {
+ ClusterState clusterState = getClusterState();
+
+ if (Boolean.TRUE.equals(requireHealthyCores)) {
+ if (!coreContainer.isStatusLoadComplete()) {
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Core
Loading not complete");
+ }
+ Collection<CloudDescriptor> coreDescriptors =
+ coreContainer.getCoreDescriptors().stream()
+ .map(CoreDescriptor::getCloudDescriptor)
+ .collect(Collectors.toList());
+ int unhealthyCores = findUnhealthyCores(coreDescriptors, clusterState);
+ if (unhealthyCores > 0) {
+ response.numCoresUnhealthy = unhealthyCores;
+ throw new SolrException(
+ SERVICE_UNAVAILABLE,
+ unhealthyCores
+ + " out of "
+ + coreContainer.getNumAllCores()
+ + " replicas are currently initializing or recovering");
+ }
+ response.message = "All cores are healthy";
+ }
+
+ response.status = OK;
+ }
+
+ private ClusterState getClusterState() {
+ ZkStateReader zkStateReader =
coreContainer.getZkController().getZkStateReader();
+ ClusterState clusterState = zkStateReader.getClusterState();
+
+ if (zkStateReader.getZkClient().isClosed() ||
!zkStateReader.getZkClient().isConnected()) {
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Not
connected to zk");
+ }
+
+ if
(!clusterState.getLiveNodes().contains(coreContainer.getZkController().getNodeName()))
{
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Not in
live nodes as per zk");
+ }
+ return clusterState;
+ }
+
+ private void healthCheckLegacyMode(NodeHealthResponse response, Integer
maxGenerationLag) {
+ List<String> laggingCoresInfo = new ArrayList<>();
+ boolean allCoresAreInSync = true;
+
+ if (maxGenerationLag != null) {
+ if (maxGenerationLag < 0) {
+ log.error("Invalid value for maxGenerationLag:[{}]", maxGenerationLag);
+ response.message =
+ String.format(Locale.ROOT, "Invalid value of maxGenerationLag:%s",
maxGenerationLag);
+ response.status = FAILURE;
+ return;
+ }
+
+ for (SolrCore core : coreContainer.getCores()) {
+ ReplicationHandler replicationHandler =
+ (ReplicationHandler)
core.getRequestHandler(ReplicationHandler.PATH);
+ if (replicationHandler.isFollower()) {
+ boolean isCoreInSync =
+ isWithinGenerationLag(core, replicationHandler,
maxGenerationLag, laggingCoresInfo);
+ allCoresAreInSync &= isCoreInSync;
+ }
+ }
+
+ if (allCoresAreInSync) {
+ response.message =
+ String.format(
+ Locale.ROOT,
+ "All the followers are in sync with leader (within
maxGenerationLag: %d) "
+ + "or the cores are acting as leader",
+ maxGenerationLag);
+ response.status = OK;
+ } else {
+ response.message =
+ String.format(
+ Locale.ROOT,
+ "Cores violating maxGenerationLag:%d.%n%s",
+ maxGenerationLag,
+ String.join(",\n", laggingCoresInfo));
+ response.status = FAILURE;
+ }
+ } else {
+ response.message =
+ "maxGenerationLag isn't specified. Followers aren't "
+ + "checking for the generation lag from the leaders";
+ response.status = OK;
+ }
+ }
+
+ private boolean isWithinGenerationLag(
+ final SolrCore core,
+ ReplicationHandler replicationHandler,
+ int maxGenerationLag,
+ List<String> laggingCoresInfo) {
+ IndexFetcher indexFetcher = null;
+ try {
+ // may not be the best way to get leader's replicableCommit; NamedList
is unavoidable here
+ // as it is the init-args format used by ReplicationHandler
+ NamedList<?> follower = (NamedList<?>)
replicationHandler.getInitArgs().get("follower");
+ indexFetcher = new IndexFetcher(follower, replicationHandler, core);
+ // getLatestVersion() returns a NamedList from the IndexFetcher network
API
+ NamedList<?> replicableCommitOnLeader = indexFetcher.getLatestVersion();
+ long leaderGeneration = (Long) replicableCommitOnLeader.get(GENERATION);
+
+ // Get our own commit and generation from the commit
+ IndexCommit commit = core.getDeletionPolicy().getLatestCommit();
+ if (commit != null) {
+ long followerGeneration = commit.getGeneration();
+ long generationDiff = leaderGeneration - followerGeneration;
+
+ // generationDiff shouldn't be negative except for some edge cases,
log it. Some scenarios
+ // are:
+ // 1) commit generation rolls over Long.MAX_VALUE (really unlikely)
+ // 2) Leader's index is wiped clean and the follower is still showing
commit generation
+ // from the old index
+ if (generationDiff < 0) {
+ log.warn("core:[{}], generation lag:[{}] is negative.", core,
generationDiff);
Review Comment:
[-1] Your response seems to be talking about the codeblock that starts on
L217 a line or two below this one, which compares generationDiff and
maxGenerationLag. Did you maybe leave this reply in the wrong place?
**This** codeblock is checking whether generationDiff is negative. So your
reply above and the Copilot rationale aren't really relevant AFAICT?
##########
solr/core/src/java/org/apache/solr/handler/admin/api/NodeHealth.java:
##########
@@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.handler.admin.api;
+
+import static
org.apache.solr.client.api.model.NodeHealthResponse.NodeStatus.FAILURE;
+import static
org.apache.solr.client.api.model.NodeHealthResponse.NodeStatus.OK;
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+import static
org.apache.solr.common.SolrException.ErrorCode.SERVICE_UNAVAILABLE;
+import static org.apache.solr.handler.admin.api.ReplicationAPIBase.GENERATION;
+import static org.apache.solr.security.PermissionNameProvider.Name.HEALTH_PERM;
+
+import jakarta.inject.Inject;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+import org.apache.lucene.index.IndexCommit;
+import org.apache.solr.api.JerseyResource;
+import org.apache.solr.client.api.endpoint.NodeHealthApi;
+import org.apache.solr.client.api.model.NodeHealthResponse;
+import org.apache.solr.cloud.CloudDescriptor;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.cloud.Replica.State;
+import org.apache.solr.common.cloud.ZkStateReader;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.CoreContainer;
+import org.apache.solr.core.CoreDescriptor;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.handler.IndexFetcher;
+import org.apache.solr.handler.ReplicationHandler;
+import org.apache.solr.jersey.PermissionName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * V2 API for checking the health of the receiving node.
+ *
+ * <p>This API (GET /v2/node/health) is analogous to the v1 /admin/info/health.
+ *
+ * <p>The v1 {@link org.apache.solr.handler.admin.HealthCheckHandler}
delegates to this class.
+ */
+public class NodeHealth extends JerseyResource implements NodeHealthApi {
+
+ private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+ private static final List<State> UNHEALTHY_STATES =
Arrays.asList(State.DOWN, State.RECOVERING);
+
+ private final CoreContainer coreContainer;
+
+ @Inject
+ public NodeHealth(CoreContainer coreContainer) {
+ this.coreContainer = coreContainer;
+ }
+
+ @Override
+ @PermissionName(HEALTH_PERM)
+ public NodeHealthResponse healthcheck(Boolean requireHealthyCores) {
+ return checkNodeHealth(requireHealthyCores, null);
+ }
+
+ /**
+ * Performs the node health check and returns the result as a {@link
NodeHealthResponse}.
+ *
+ * <p>This overload is used by the v1 {@link
+ * org.apache.solr.handler.admin.HealthCheckHandler#handleRequestBody} path,
which can supply the
+ * legacy {@code maxGenerationLag} parameter that is not exposed via the v2
endpoint.
+ */
+ public NodeHealthResponse checkNodeHealth(Boolean requireHealthyCores,
Integer maxGenerationLag) {
+ if (coreContainer == null || coreContainer.isShutDown()) {
+ throw new SolrException(
+ SERVER_ERROR, "CoreContainer is either not initialized or shutting
down");
+ }
+
+ final NodeHealthResponse response =
instantiateJerseyResponse(NodeHealthResponse.class);
+
+ if (!coreContainer.isZooKeeperAware()) {
+ if (log.isDebugEnabled()) {
+ log.debug("Invoked HealthCheckHandler in legacy mode.");
+ }
+ healthCheckLegacyMode(response, maxGenerationLag);
+ } else {
+ if (log.isDebugEnabled()) {
+ log.debug(
+ "Invoked HealthCheckHandler in cloud mode on [{}]",
+ coreContainer.getZkController().getNodeName());
+ }
+ healthCheckCloudMode(response, requireHealthyCores);
+ }
+
+ return response;
+ }
+
+ private void healthCheckCloudMode(NodeHealthResponse response, Boolean
requireHealthyCores) {
+ ClusterState clusterState = getClusterState();
+
+ if (Boolean.TRUE.equals(requireHealthyCores)) {
+ if (!coreContainer.isStatusLoadComplete()) {
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Core
Loading not complete");
+ }
+ Collection<CloudDescriptor> coreDescriptors =
+ coreContainer.getCoreDescriptors().stream()
+ .map(CoreDescriptor::getCloudDescriptor)
+ .collect(Collectors.toList());
+ int unhealthyCores = findUnhealthyCores(coreDescriptors, clusterState);
+ if (unhealthyCores > 0) {
+ response.numCoresUnhealthy = unhealthyCores;
+ throw new SolrException(
+ SERVICE_UNAVAILABLE,
+ unhealthyCores
+ + " out of "
+ + coreContainer.getNumAllCores()
+ + " replicas are currently initializing or recovering");
+ }
+ response.message = "All cores are healthy";
+ }
+
+ response.status = OK;
+ }
+
+ private ClusterState getClusterState() {
+ ZkStateReader zkStateReader =
coreContainer.getZkController().getZkStateReader();
+ ClusterState clusterState = zkStateReader.getClusterState();
+
+ if (zkStateReader.getZkClient().isClosed() ||
!zkStateReader.getZkClient().isConnected()) {
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Not
connected to zk");
+ }
+
+ if
(!clusterState.getLiveNodes().contains(coreContainer.getZkController().getNodeName()))
{
+ throw new SolrException(SERVICE_UNAVAILABLE, "Host Unavailable: Not in
live nodes as per zk");
+ }
+ return clusterState;
+ }
+
+ private void healthCheckLegacyMode(NodeHealthResponse response, Integer
maxGenerationLag) {
+ List<String> laggingCoresInfo = new ArrayList<>();
+ boolean allCoresAreInSync = true;
+
+ if (maxGenerationLag != null) {
+ if (maxGenerationLag < 0) {
+ log.error("Invalid value for maxGenerationLag:[{}]", maxGenerationLag);
+ response.message =
+ String.format(Locale.ROOT, "Invalid value of maxGenerationLag:%s",
maxGenerationLag);
+ response.status = FAILURE;
+ return;
+ }
+
+ for (SolrCore core : coreContainer.getCores()) {
+ ReplicationHandler replicationHandler =
+ (ReplicationHandler)
core.getRequestHandler(ReplicationHandler.PATH);
+ if (replicationHandler.isFollower()) {
+ boolean isCoreInSync =
+ isWithinGenerationLag(core, replicationHandler,
maxGenerationLag, laggingCoresInfo);
+ allCoresAreInSync &= isCoreInSync;
+ }
+ }
+
+ if (allCoresAreInSync) {
+ response.message =
+ String.format(
+ Locale.ROOT,
+ "All the followers are in sync with leader (within
maxGenerationLag: %d) "
+ + "or the cores are acting as leader",
+ maxGenerationLag);
+ response.status = OK;
+ } else {
+ response.message =
+ String.format(
+ Locale.ROOT,
+ "Cores violating maxGenerationLag:%d.%n%s",
+ maxGenerationLag,
+ String.join(",\n", laggingCoresInfo));
+ response.status = FAILURE;
+ }
+ } else {
+ response.message =
+ "maxGenerationLag isn't specified. Followers aren't "
+ + "checking for the generation lag from the leaders";
+ response.status = OK;
+ }
+ }
+
+ private boolean isWithinGenerationLag(
+ final SolrCore core,
+ ReplicationHandler replicationHandler,
+ int maxGenerationLag,
+ List<String> laggingCoresInfo) {
+ IndexFetcher indexFetcher = null;
+ try {
+ // may not be the best way to get leader's replicableCommit; NamedList
is unavoidable here
+ // as it is the init-args format used by ReplicationHandler
+ NamedList<?> follower = (NamedList<?>)
replicationHandler.getInitArgs().get("follower");
+ indexFetcher = new IndexFetcher(follower, replicationHandler, core);
+ // getLatestVersion() returns a NamedList from the IndexFetcher network
API
+ NamedList<?> replicableCommitOnLeader = indexFetcher.getLatestVersion();
+ long leaderGeneration = (Long) replicableCommitOnLeader.get(GENERATION);
+
+ // Get our own commit and generation from the commit
+ IndexCommit commit = core.getDeletionPolicy().getLatestCommit();
+ if (commit != null) {
+ long followerGeneration = commit.getGeneration();
+ long generationDiff = leaderGeneration - followerGeneration;
+
+ // generationDiff shouldn't be negative except for some edge cases,
log it. Some scenarios
+ // are:
+ // 1) commit generation rolls over Long.MAX_VALUE (really unlikely)
+ // 2) Leader's index is wiped clean and the follower is still showing
commit generation
+ // from the old index
+ if (generationDiff < 0) {
+ log.warn("core:[{}], generation lag:[{}] is negative.", core,
generationDiff);
+ } else if (generationDiff > maxGenerationLag) {
Review Comment:
OK, on further thought:
I think you (and Copilot) are correct that the original else-if condition
here was out-of-sync with the log message. With those conditions and ret-vals
fixed, the log message looks "correct" to me.
If we're suggesting that this **is** an error case though, maybe the message
should be a "warn" to make that clearer. But otherwise I think this is good 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]