[ 
https://issues.apache.org/jira/browse/CASSANDRA-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ConfX updated CASSANDRA-21097:
------------------------------
    Description: 
The `ClusterUtils.parseGossipInfo` method parses the gossip info output from 
`nodetool gossipinfo` by checking the line starts with "/". It expects endpoint 
addresses to start with "/".

This assumption might be broken in the future with the following workflow, let 
me use an existing example to show you.

In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call 
ALL JMX attributes including:
{code:java}
// FailureDetectorMBean methods that trigger hostname resolution:
getAllEndpointStatesWithResolveIp()      // resolveIp=true
getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
These methods call `entry.getKey().getHostName()` on the gossip endpoint 
addresses (`FailureDetector.java:149`), which caches the hostname in the 
underlying `InetAddress` objects.

Then:
{code:java}
1. Test starts, cluster initializes
   └── Gossip endpoints created with InetAddress (hostname NOT resolved)
   └── InetAddress.toString() → "/127.0.0.1"
2. JMXGetterCheckTest calls all JMX getters
   └── Calls getAllEndpointStatesWithResolveIp()
   └── This calls getHostName() on gossip endpoint InetAddress objects
   └── Hostname "localhost" gets CACHED in the InetAddress objects
3. Node restart occurs
   └── Restart adapter calls awaitGossipSchemaMatch()
   └── This runs "nodetool gossipinfo" (resolveIp=false)
   └── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
4. parseGossipInfo() fails
   └── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
   └── currentInstance remains null
   └── Objects.requireNonNull(currentInstance) → NPE! {code}
 

Code that checks line start with "/":
{code:java}
private static Map<String, Map<String, String>> parseGossipInfo(String str)
{
    Map<String, Map<String, String>> map = new HashMap<>();
    String[] lines = str.split("\n");
    String currentInstance = null;
    for (String line : lines)
    {
        if (line.startsWith("/"))  // BUG: This condition is incorrect
        {
            // start of new instance
            currentInstance = line;
            continue;
        }
        Objects.requireNonNull(currentInstance);  // NPE thrown here
        String[] kv = line.trim().split(":", 2);
        assert kv.length == 2 : "When splitting line '" + line + "' expected 2 
parts but not true";
        Map<String, String> state = map.computeIfAbsent(currentInstance, ignore 
-> new HashMap<>());
        state.put(kv[0], kv[1]);
    }
    return map;
} {code}
Java's `InetAddress` caches hostname at the *object level.* The gossip 
`endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY code 
path calls `getHostName()` on these objects, all subsequent `toString()` calls 
return the hostname.
h2. Stacktrace
{code:java}
java.lang.NullPointerException
    at java.base/java.util.Objects.requireNonNull(Objects.java:209)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
 {code}

  was:
The `ClusterUtils.parseGossipInfo` method parses the gossip info output from 
`nodetool gossipinfo` by checking the line starts with "/". It expects endpoint 
addresses to start with "/".

This assumption might be broken in the future with the following workflow, let 
me use an existing example to show you.

In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call 
ALL JMX attributes including:

 
{code:java}
// FailureDetectorMBean methods that trigger hostname resolution:
getAllEndpointStatesWithResolveIp()      // resolveIp=true
getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
These methods call `entry.getKey().getHostName()` on the gossip endpoint 
addresses (`FailureDetector.java:149`), which caches the hostname in the 
underlying `InetAddress` objects.

Then:
{code:java}
1. Test starts, cluster initializes
   └── Gossip endpoints created with InetAddress (hostname NOT resolved)
   └── InetAddress.toString() → "/127.0.0.1"
2. JMXGetterCheckTest calls all JMX getters
   └── Calls getAllEndpointStatesWithResolveIp()
   └── This calls getHostName() on gossip endpoint InetAddress objects
   └── Hostname "localhost" gets CACHED in the InetAddress objects
3. Node restart occurs
   └── Restart adapter calls awaitGossipSchemaMatch()
   └── This runs "nodetool gossipinfo" (resolveIp=false)
   └── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
4. parseGossipInfo() fails
   └── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
   └── currentInstance remains null
   └── Objects.requireNonNull(currentInstance) → NPE! {code}
 

Code that checks line start with "/":
{code:java}
private static Map<String, Map<String, String>> parseGossipInfo(String str)
{
    Map<String, Map<String, String>> map = new HashMap<>();
    String[] lines = str.split("\n");
    String currentInstance = null;
    for (String line : lines)
    {
        if (line.startsWith("/"))  // BUG: This condition is incorrect
        {
            // start of new instance
            currentInstance = line;
            continue;
        }
        Objects.requireNonNull(currentInstance);  // NPE thrown here
        String[] kv = line.trim().split(":", 2);
        assert kv.length == 2 : "When splitting line '" + line + "' expected 2 
parts but not true";
        Map<String, String> state = map.computeIfAbsent(currentInstance, ignore 
-> new HashMap<>());
        state.put(kv[0], kv[1]);
    }
    return map;
} {code}
Java's `InetAddress` caches hostname at the *object level.* The gossip 
`endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY code 
path calls `getHostName()` on these objects, all subsequent `toString()` calls 
return the hostname.
h2. Stacktrace
{code:java}
java.lang.NullPointerException
    at java.base/java.util.Objects.requireNonNull(Objects.java:209)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
    at 
org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
 {code}


> ClusterUtils.parseGossipInfo improper assumption might fail to parse gossip 
> output after a node restart
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21097
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21097
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/java
>            Reporter: ConfX
>            Priority: Normal
>
> The `ClusterUtils.parseGossipInfo` method parses the gossip info output from 
> `nodetool gossipinfo` by checking the line starts with "/". It expects 
> endpoint addresses to start with "/".
> This assumption might be broken in the future with the following workflow, 
> let me use an existing example to show you.
> In the existing test {*}JMXGetterCheckTest.testAllValidGetters(){*}, it call 
> ALL JMX attributes including:
> {code:java}
> // FailureDetectorMBean methods that trigger hostname resolution:
> getAllEndpointStatesWithResolveIp()      // resolveIp=true
> getAllEndpointStatesWithPortAndResolveIp() // resolveIp=true {code}
> These methods call `entry.getKey().getHostName()` on the gossip endpoint 
> addresses (`FailureDetector.java:149`), which caches the hostname in the 
> underlying `InetAddress` objects.
> Then:
> {code:java}
> 1. Test starts, cluster initializes
>    └── Gossip endpoints created with InetAddress (hostname NOT resolved)
>    └── InetAddress.toString() → "/127.0.0.1"
> 2. JMXGetterCheckTest calls all JMX getters
>    └── Calls getAllEndpointStatesWithResolveIp()
>    └── This calls getHostName() on gossip endpoint InetAddress objects
>    └── Hostname "localhost" gets CACHED in the InetAddress objects
> 3. Node restart occurs
>    └── Restart adapter calls awaitGossipSchemaMatch()
>    └── This runs "nodetool gossipinfo" (resolveIp=false)
>    └── But InetAddress.toString() now returns "localhost/127.0.0.1" (cached!)
> 4. parseGossipInfo() fails
>    └── Checks: line.startsWith("/") → FALSE for "localhost/127.0.0.1"
>    └── currentInstance remains null
>    └── Objects.requireNonNull(currentInstance) → NPE! {code}
>  
> Code that checks line start with "/":
> {code:java}
> private static Map<String, Map<String, String>> parseGossipInfo(String str)
> {
>     Map<String, Map<String, String>> map = new HashMap<>();
>     String[] lines = str.split("\n");
>     String currentInstance = null;
>     for (String line : lines)
>     {
>         if (line.startsWith("/"))  // BUG: This condition is incorrect
>         {
>             // start of new instance
>             currentInstance = line;
>             continue;
>         }
>         Objects.requireNonNull(currentInstance);  // NPE thrown here
>         String[] kv = line.trim().split(":", 2);
>         assert kv.length == 2 : "When splitting line '" + line + "' expected 
> 2 parts but not true";
>         Map<String, String> state = map.computeIfAbsent(currentInstance, 
> ignore -> new HashMap<>());
>         state.put(kv[0], kv[1]);
>     }
>     return map;
> } {code}
> Java's `InetAddress` caches hostname at the *object level.* The gossip 
> `endpointStateMap` contains shared `InetAddressAndPort` objects. Once ANY 
> code path calls `getHostName()` on these objects, all subsequent `toString()` 
> calls return the hostname.
> h2. Stacktrace
> {code:java}
> java.lang.NullPointerException
>     at java.base/java.util.Objects.requireNonNull(Objects.java:209)
>     at 
> org.apache.cassandra.distributed.shared.ClusterUtils.parseGossipInfo(ClusterUtils.java:691)
>     at 
> org.apache.cassandra.distributed.shared.ClusterUtils.gossipInfo(ClusterUtils.java:655)
>     at 
> org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossip(ClusterUtils.java:554)
>     at 
> org.apache.cassandra.distributed.shared.ClusterUtils.awaitGossipSchemaMatch(ClusterUtils.java:600)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to