[
https://issues.apache.org/jira/browse/CASSANDRA-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ConfX updated CASSANDRA-21097:
------------------------------
Description:
The `ClusterUtils.parseGossipInfo` method parses the gossip info output from
`nodetool gossipinfo` by checking the line starts with "/". It expects endpoint
addresses to start with "/".
This assumption might be broken in the future with the following workflow, let
me use an existing example to show you.
In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call
ALL JMX attributes including:
{code:java}
// FailureDetectorMBean methods that trigger hostname resolution:
getAllEndpointStatesWithResolveIp() // resolveIp=true
getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
These methods call `entry.getKey().getHostName()` on the gossip endpoint
addresses (`FailureDetector.java:149`), which caches the hostname in the
underlying `InetAddress` objects.
Then:
{code:java}
1. Test starts, cluster initializes
└── Gossip endpoints created with InetAddress (hostname NOT resolved)
└── InetAddress.toString() → "/127.0.0.1"
2. JMXGetterCheckTest calls all JMX getters
└── Calls getAllEndpointStatesWithResolveIp()
└── This calls getHostName() on gossip endpoint InetAddress objects
└── Hostname "localhost" gets CACHED in the InetAddress objects
3. Node restart occurs
└── Restart adapter calls awaitGossipSchemaMatch()
└── This runs "nodetool gossipinfo" (resolveIp=false)
└── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
4. parseGossipInfo() fails
└── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
└── currentInstance remains null
└── Objects.requireNonNull(currentInstance) → NPE! {code}
Code that checks line start with "/":
{code:java}
private static Map<String, Map<String, String>> parseGossipInfo(String str)
{
Map<String, Map<String, String>> map = new HashMap<>();
String[] lines = str.split("\n");
String currentInstance = null;
for (String line : lines)
{
if (line.startsWith("/")) // BUG: This condition is incorrect
{
// start of new instance
currentInstance = line;
continue;
}
Objects.requireNonNull(currentInstance); // NPE thrown here
String[] kv = line.trim().split(":", 2);
assert kv.length == 2 : "When splitting line '" + line + "' expected 2
parts but not true";
Map<String, String> state = map.computeIfAbsent(currentInstance, ignore
-> new HashMap<>());
state.put(kv[0], kv[1]);
}
return map;
} {code}
Java's `InetAddress` caches hostname at the *object level.* The gossip
`endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY code
path calls `getHostName()` on these objects, all subsequent `toString()` calls
return the hostname.
h2. Stacktrace
{code:java}
java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:209)
at
org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
at
org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
at
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
at
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
{code}
was:
The `ClusterUtils.parseGossipInfo` method parses the gossip info output from
`nodetool gossipinfo` by checking the line starts with "/". It expects endpoint
addresses to start with "/".
This assumption might be broken in the future with the following workflow, let
me use an existing example to show you.
In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call
ALL JMX attributes including:
{code:java}
// FailureDetectorMBean methods that trigger hostname resolution:
getAllEndpointStatesWithResolveIp() // resolveIp=true
getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
These methods call `entry.getKey().getHostName()` on the gossip endpoint
addresses (`FailureDetector.java:149`), which caches the hostname in the
underlying `InetAddress` objects.
Then:
{code:java}
1. Test starts, cluster initializes
└── Gossip endpoints created with InetAddress (hostname NOT resolved)
└── InetAddress.toString() → "/127.0.0.1"
2. JMXGetterCheckTest calls all JMX getters
└── Calls getAllEndpointStatesWithResolveIp()
└── This calls getHostName() on gossip endpoint InetAddress objects
└── Hostname "localhost" gets CACHED in the InetAddress objects
3. Node restart occurs
└── Restart adapter calls awaitGossipSchemaMatch()
└── This runs "nodetool gossipinfo" (resolveIp=false)
└── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
4. parseGossipInfo() fails
└── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
└── currentInstance remains null
└── Objects.requireNonNull(currentInstance) → NPE! {code}
Code that checks line start with "/":
{code:java}
private static Map<String, Map<String, String>> parseGossipInfo(String str)
{
Map<String, Map<String, String>> map = new HashMap<>();
String[] lines = str.split("\n");
String currentInstance = null;
for (String line : lines)
{
if (line.startsWith("/")) // BUG: This condition is incorrect
{
// start of new instance
currentInstance = line;
continue;
}
Objects.requireNonNull(currentInstance); // NPE thrown here
String[] kv = line.trim().split(":", 2);
assert kv.length == 2 : "When splitting line '" + line + "' expected 2
parts but not true";
Map<String, String> state = map.computeIfAbsent(currentInstance, ignore
-> new HashMap<>());
state.put(kv[0], kv[1]);
}
return map;
} {code}
Java's `InetAddress` caches hostname at the *object level.* The gossip
`endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY code
path calls `getHostName()` on these objects, all subsequent `toString()` calls
return the hostname.
h2. Stacktrace
{code:java}
java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:209)
at
org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
at
org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
at
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
at
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
{code}
> ClusterUtils.parseGossipInfo improper assumption might fail to parse gossip
> output after a node restart
> -------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21097
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: ConfX
> Priority: Normal
>
> The `ClusterUtils.parseGossipInfo` method parses the gossip info output from
> `nodetool gossipinfo` by checking the line starts with "/". It expects
> endpoint addresses to start with "/".
> This assumption might be broken in the future with the following workflow,
> let me use an existing example to show you.
> In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call
> ALL JMX attributes including:
> {code:java}
> // FailureDetectorMBean methods that trigger hostname resolution:
> getAllEndpointStatesWithResolveIp() // resolveIp=true
> getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
> These methods call `entry.getKey().getHostName()` on the gossip endpoint
> addresses (`FailureDetector.java:149`), which caches the hostname in the
> underlying `InetAddress` objects.
> Then:
> {code:java}
> 1. Test starts, cluster initializes
> └── Gossip endpoints created with InetAddress (hostname NOT resolved)
> └── InetAddress.toString() → "/127.0.0.1"
> 2. JMXGetterCheckTest calls all JMX getters
> └── Calls getAllEndpointStatesWithResolveIp()
> └── This calls getHostName() on gossip endpoint InetAddress objects
> └── Hostname "localhost" gets CACHED in the InetAddress objects
> 3. Node restart occurs
> └── Restart adapter calls awaitGossipSchemaMatch()
> └── This runs "nodetool gossipinfo" (resolveIp=false)
> └── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
> 4. parseGossipInfo() fails
> └── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
> └── currentInstance remains null
> └── Objects.requireNonNull(currentInstance) → NPE! {code}
>
> Code that checks line start with "/":
> {code:java}
> private static Map<String, Map<String, String>> parseGossipInfo(String str)
> {
> Map<String, Map<String, String>> map = new HashMap<>();
> String[] lines = str.split("\n");
> String currentInstance = null;
> for (String line : lines)
> {
> if (line.startsWith("/")) // BUG: This condition is incorrect
> {
> // start of new instance
> currentInstance = line;
> continue;
> }
> Objects.requireNonNull(currentInstance); // NPE thrown here
> String[] kv = line.trim().split(":", 2);
> assert kv.length == 2 : "When splitting line '" + line + "' expected
> 2 parts but not true";
> Map<String, String> state = map.computeIfAbsent(currentInstance,
> ignore -> new HashMap<>());
> state.put(kv[0], kv[1]);
> }
> return map;
> } {code}
> Java's `InetAddress` caches hostname at the *object level.* The gossip
> `endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY
> code path calls `getHostName()` on these objects, all subsequent `toString()`
> calls return the hostname.
> h2. Stacktrace
> {code:java}
> java.lang.NullPointerException
> at java.base/java.util.Objects.requireNonNull(Objects.java:209)
> at
> org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
> at
> org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
> at
> org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
> at
> org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]