Attila Magyar created HIVE-23469:
------------------------------------

             Summary: Use hostname + pod UID for shuffle manager caching
                 Key: HIVE-23469
                 URL: https://issues.apache.org/jira/browse/HIVE-23469
             Project: Hive
          Issue Type: Bug
          Components: Tez
            Reporter: Attila Magyar
            Assignee: Attila Magyar


When a pod restarts, it uses the same hostname and shuffle port. Now when 
fetcher threads connects to download the shuffle data it will use the cached 
connection info and since the pod has died it's shuffle data will also get 
cleaned up. When the pod restarts, it receives connection from clients to 
download specific shuffle data but the daemon will not have it because of the 
restart.

In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo 
which is a combination of host+port and the host's unique ID. The host host Id 
changes when a node is killed or restarted.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to