[ 
https://issues.apache.org/jira/browse/KUDU-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek updated KUDU-3278:
---------------------------
    Description: 
Steps to replicate:

Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3.

If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing 
during election/pre-elections irrespective of TS3 state (running/not-running):

Sample failure logs:
{code:java}
W0429 04:14:11.043696 801167 leader_election.cc:270] T 
ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 
[CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer 
dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address for 
achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. 
Counting it as a 'NO' vote.
 F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() Bad 
status: Network error: Could not obtain a remote proxy to the peer.: unable to 
resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or 
service not known{code}

  was:
Steps to replicate:

Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3.

If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing 
during election/pre-elections irrespective of TS3 state (running/not-running):

Sample failure logs:

W0429 04:14:11.043696 801167 leader_election.cc:270] T 
ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 
[CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer 
dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address for 
achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. 
Counting it as a 'NO' vote.
F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() Bad 
status: Network error: Could not obtain a remote proxy to the peer.: unable to 
resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or 
service not known


> DNS entry removal of a tablet server causes one of its peers to crash
> ---------------------------------------------------------------------
>
>                 Key: KUDU-3278
>                 URL: https://issues.apache.org/jira/browse/KUDU-3278
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus, tserver
>    Affects Versions: 1.10.0, 1.14.0
>            Reporter: Abhishek
>            Priority: Major
>
> Steps to replicate:
> Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3.
> If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing 
> during election/pre-elections irrespective of TS3 state (running/not-running):
> Sample failure logs:
> {code:java}
> W0429 04:14:11.043696 801167 leader_election.cc:270] T 
> ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 
> [CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer 
> dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address 
> for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. 
> Counting it as a 'NO' vote.
>  F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() 
> Bad status: Network error: Could not obtain a remote proxy to the peer.: 
> unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: 
> Name or service not known{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to