yomipq opened a new issue, #1873:
URL: https://github.com/apache/cassandra-gocql-driver/issues/1873

   I have a trouble connecting to AWS Amazon Keyspaces (for Apache Cassandra). 
My program on EC2 can connect to Amazon Keyspaces without any issues for a 
while, but after a few days or weeks, it loses the connection and any query 
causes the error below.
   
   ```
   gocql: no hosts available in the pool
   ```
   
   Go version: 1.23.4
   GoCQL version: 1.7.0
   
   I built the program with gocql_debug enabled, and I got following logs.
   
   ```
   2025/03/25 03:20:29 gocql: Session.handleNodeConnected: 172.16.1.14:9142
   2025/03/25 03:20:29 gocql: conns of pool after stopped "172.16.1.14": 2
   2025/03/25 03:20:29 gocql: Session.handleNodeConnected: 172.16.1.28:9142
   2025/03/25 03:20:29 gocql: conns of pool after stopped "172.16.1.28": 2
   2025/03/25 03:21:29 Session.ring:[172.16.1.14:UP][172.16.1.28:UP]
   
   ...
   
   2025/03/26 15:11:11 gocql: unable to dial "[HostInfo hostname=\"\" 
connectAddress=\"127.0.0.1\" peer=\"<nil>\" rpc_address=\"127.0.0.1\" 
broadcast_address=\"127.0.0.1\" preferred_ip=\"<nil>\" 
connect_addr=\"127.0.0.1\" connect_addr_source=\"connect_address\" port=9142 
data_centre=\"ap-northeast-1\" rack=\"ap-northeast-1\" 
host_id=\"be0f3a14-e107-3fee-a5e5-415c10539abd\" version=\"v3.11.2\" state=UP 
num_tokens=0]": dial tcp 127.0.0.1:9142: connect: connection refused
   2025/03/26 15:11:11 gocql: filling stopped "127.0.0.1": dial tcp 
127.0.0.1:9142: connect: connection refused
   2025/03/26 15:11:11 gocql: conns of pool after stopped "127.0.0.1": 0
   2025/03/26 15:11:11 gocql: Session.handleNodeDown: 127.0.0.1:9142
   2025/03/26 15:11:11 gocql: unable to refresh ring: get existing 
host=[HostInfo hostname="" connectAddress="172.16.1.14" peer="172.16.1.14" 
rpc_address="172.16.1.14" broadcast_address="<nil>" preferred_ip="172.16.1.14" 
connect_addr="172.16.1.14" connect_addr_source="connect_address" port=9142 
data_centre="ap-northeast-1" rack="ap-northeast-1" 
host_id="be0f3a14-e107-3fee-a5e5-415c10539abd" version="v3.11.2" state=UP 
num_tokens=1] from prevHosts: cannot find host
   2025/03/26 15:11:29 Session.ring:[127.0.0.1:DOWN][172.16.1.28:UP]
   
   ...
   
   2025/03/26 22:43:35 gocql: unable to dial "[HostInfo hostname=\"\" 
connectAddress=\"127.0.0.1\" peer=\"<nil>\" rpc_address=\"127.0.0.1\" 
broadcast_address=\"127.0.0.1\" preferred_ip=\"<nil>\" 
connect_addr=\"127.0.0.1\" connect_addr_source=\"connect_address\" port=9142 
data_centre=\"ap-northeast-1\" rack=\"ap-northeast-1\" 
host_id=\"b666465e-cb85-3efa-b3ab-f6cf139e5a39\" version=\"v3.11.2\" state=UP 
num_tokens=0]": dial tcp 127.0.0.1:9142: connect: connection refused
   2025/03/26 22:43:35 gocql: filling stopped "127.0.0.1": dial tcp 
127.0.0.1:9142: connect: connection refused
   2025/03/26 22:43:35 gocql: conns of pool after stopped "127.0.0.1": 0
   2025/03/26 22:43:35 gocql: Session.handleNodeDown: 127.0.0.1:9142
   2025/03/26 22:43:35 gocql: unable to refresh ring: get existing 
host=[HostInfo hostname="" connectAddress="172.16.1.28" peer="172.16.1.28" 
rpc_address="172.16.1.28" broadcast_address="<nil>" preferred_ip="172.16.1.28" 
connect_addr="172.16.1.28" connect_addr_source="connect_address" port=9142 
data_centre="ap-northeast-1" rack="ap-northeast-1" 
host_id="b666465e-cb85-3efa-b3ab-f6cf139e5a39" version="v3.11.2" state=UP 
num_tokens=1] from prevHosts: cannot find host
   2025/03/26 22:44:29 Session.ring:[127.0.0.1:DOWN][127.0.0.1:DOWN]
   ```
   
   On startup, It has two hosts 172.16.1.14 and 172.16.1.28. After a while, the 
connection to 172.16.1.14 got lost with error `cannot find host` and try to 
reconnect to 127.0.0.1 instead of 172.16.1.14. After another while, the other 
connection also got lost with the same error and also try to reconnect to 
127.0.0.1 instead of 172.16.1.28. As a result, all connections got lost.
   
   So here are my questions:
   
   First, in what situation the error `cannot find host` occur? Is this an 
expected error? I read the source code, but I couldn't understand it well.
   Second, what makes it reconnect to 127.0.0.1 instead of original address? Is 
this an expected behavior?
   
   If anyone has any idea, please let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to