rpuch commented on code in PR #5255: URL: https://github.com/apache/ignite-3/pull/5255#discussion_r2002616315
########## modules/network-api/src/main/java/org/apache/ignite/internal/network/DuplicateConsistentIdException.java: ########## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.internal.network; + +import org.apache.ignite.internal.lang.IgniteInternalException; +import org.apache.ignite.lang.ErrorGroups.Network; + +/** + * Thrown when there's a duplicate consistent ID in the physical topology. Review Comment: ```suggestion * Thrown when a send by consistent ID fails because there's a duplicate consistent ID in the physical topology. ``` ########## modules/network/src/main/java/org/apache/ignite/internal/network/scalecube/ScaleCubeTopologyService.java: ########## @@ -140,6 +156,12 @@ void updateLocalMetadata(@Nullable NodeMetadata metadata) { ClusterNode node = fromMember(cluster.member(), metadata); members.put(node.address(), node); membersByConsistentId.computeIfAbsent(node.name(), k -> new ConcurrentHashMap<>()).put(node.id(), node); + membersByConsistentIdInLogicalTopology.compute(node.name(), (consId, clusterNode) -> { Review Comment: ```suggestion membersByConsistentIdInLogicalTopology.compute(node.name(), (consId, prevNode) -> { ``` ########## modules/network/src/main/java/org/apache/ignite/internal/network/scalecube/ScaleCubeTopologyService.java: ########## @@ -191,8 +213,26 @@ public ClusterNode getByAddress(NetworkAddress addr) { /** {@inheritDoc} */ @Override public @Nullable ClusterNode getByConsistentId(String consistentId) { + ClusterNode nodeInLogicalTopology = membersByConsistentIdInLogicalTopology.get(consistentId); + if (nodeInLogicalTopology != null) { + // Node is in the logical topology, check if it's in the physical topology. This could happen when topology is restored on + // node start, but the node is not present anymore + ClusterNode node = idToMemberMap.get(nodeInLogicalTopology.id()); + if (node != null) { + return node; + } + } + + // Node is not in the logical topology, check if it's the only node in the physical topology Map<UUID, ClusterNode> nodes = membersByConsistentId.get(consistentId); - return nodes != null ? nodes.values().iterator().next() : null; + if (nodes == null) { + return null; + } + if (nodes.size() > 1) { + LOG.error("Node \"{}\" has duplicate in the physical topology", consistentId); Review Comment: Let's also log the addresses of the duplicates ########## modules/api/src/main/java/org/apache/ignite/lang/ErrorGroups.java: ########## @@ -500,6 +500,9 @@ public static class Network { /** Could not resolve address. */ public static final int ADDRESS_UNRESOLVED_ERR = NETWORK_ERR_GROUP.registerErrorCode((short) 6); + + /** Duplicate consistent ID. */ + public static final int DUPLICATE_CONSISTENT_ID_ERR = NETWORK_ERR_GROUP.registerErrorCode((short) 7); Review Comment: Do we really need a new public error code? I mean, can exception with this code reach the user? ########## modules/network/src/main/java/org/apache/ignite/internal/network/scalecube/ScaleCubeTopologyService.java: ########## @@ -61,6 +62,9 @@ final class ScaleCubeTopologyService extends AbstractTopologyService { /** Topology members map from the consistent id to the map from the id to the cluster node. */ private final ConcurrentMap<String, Map<UUID, ClusterNode>> membersByConsistentId = new ConcurrentHashMap<>(); + /** Topology members map from the consistent id to the cluster node. Only contains nodes that are joined logical topology. */ Review Comment: ```suggestion /** Topology members map from the consistent id to the cluster node. Only contains nodes that joined logical topology. */ ``` ########## modules/network/src/main/java/org/apache/ignite/internal/network/scalecube/ScaleCubeTopologyService.java: ########## @@ -191,8 +213,26 @@ public ClusterNode getByAddress(NetworkAddress addr) { /** {@inheritDoc} */ @Override public @Nullable ClusterNode getByConsistentId(String consistentId) { + ClusterNode nodeInLogicalTopology = membersByConsistentIdInLogicalTopology.get(consistentId); + if (nodeInLogicalTopology != null) { + // Node is in the logical topology, check if it's in the physical topology. This could happen when topology is restored on Review Comment: It's not clear what 'could happen'. That the node is in the LT, but not in the PT? Anyway, please rephrase. ########## modules/runner/src/main/java/org/apache/ignite/internal/app/IgniteImpl.java: ########## @@ -1250,6 +1252,20 @@ private GroupStoragesContextResolver createGroupStoragesContextResolver() { ); } + private static LogicalTopologyEventListener joinedNodesListener(JoinedNodes joinedNodes) { Review Comment: ```suggestion private static LogicalTopologyEventListener logicalTopologyJoinedNodesListener(JoinedNodes joinedNodes) { ``` ########## modules/network/src/main/java/org/apache/ignite/internal/network/scalecube/ScaleCubeTopologyService.java: ########## @@ -232,4 +272,16 @@ private static NodeMetadata deserializeMetadata(@Nullable ByteBuffer buffer) { return null; } } + + @Override + public void onJoined(ClusterNode node) { + LOG.info("Node joined logical topology [node={}]", node); Review Comment: I'm not sure we need to log it here (and on leave) as INFO. This seems to be useful when debugging this component during development, but in production information about a node being joined the LT is already logged, so this will just add noise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org