[ https://issues.apache.org/jira/browse/KUDU-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek updated KUDU-3278: --------------------------- Description: Steps to replicate: Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3. If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing during election/pre-elections irrespective of TS3 state (running/not-running): Sample failure logs: {code:java} W0429 04:14:11.043696 801167 leader_election.cc:270] T ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 [CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. Counting it as a 'NO' vote. F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() Bad status: Network error: Could not obtain a remote proxy to the peer.: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known{code} was: Steps to replicate: Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3. If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing during election/pre-elections irrespective of TS3 state (running/not-running): Sample failure logs: W0429 04:14:11.043696 801167 leader_election.cc:270] T ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 [CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. Counting it as a 'NO' vote. F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() Bad status: Network error: Could not obtain a remote proxy to the peer.: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known > DNS entry removal of a tablet server causes one of its peers to crash > --------------------------------------------------------------------- > > Key: KUDU-3278 > URL: https://issues.apache.org/jira/browse/KUDU-3278 > Project: Kudu > Issue Type: Improvement > Components: consensus, tserver > Affects Versions: 1.10.0, 1.14.0 > Reporter: Abhishek > Priority: Major > > Steps to replicate: > Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3. > If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing > during election/pre-elections irrespective of TS3 state (running/not-running): > Sample failure logs: > {code:java} > W0429 04:14:11.043696 801167 leader_election.cc:270] T > ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 > [CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer > dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address > for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. > Counting it as a 'NO' vote. > F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() > Bad status: Network error: Could not obtain a remote proxy to the peer.: > unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: > Name or service not known{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)