nvollmar opened a new issue, #1182:
URL: https://github.com/apache/incubator-pekko/issues/1182

   I uncovered this investigating cluster issues on our nightly deployment 
test. Since we started to use a low power cpu governor during the night we 
started seeing issues of a Pekko cluster forming during the nightly deployment.
   
   I've tracked it down to the `TcpDnsClient` / `TcpConnection` initialization 
timing out, leaving it in a state it cannot recover from and never responding 
to any requests.
   
   The `TcpOutgoingConnection` is connecting and responds with a 
`Tcp.Connected` message to the `TcpDnsClient`, which in turn registers itself 
on the connection again:
   
https://github.com/apache/incubator-pekko/blob/46e60a61fbabce5e3f36a408bfa3d1fb249eef44/actor/src/main/scala/org/apache/pekko/io/dns/internal/TcpDnsClient.scala#L52
   
   
   If that message arrives late, the `TcpOutgoingConnection` will stop itself 
and `TcpDnsClient` has no detection or handling for this case:
   
   
https://github.com/apache/incubator-pekko/blob/46e60a61fbabce5e3f36a408bfa3d1fb249eef44/actor/src/main/scala/org/apache/pekko/io/TcpConnection.scala#L104
   
   This is a very unusual case, but it happens almost every deployment for one 
or two pods when the system is in low power mode.
   
   Proposed fix: `TcpDnsClient` must watch the connection and fail on 
termination to re-initialize (it is already handled by a backoff supervisor)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to