[jira] [Commented] (HIVE-15102) Hiveptest is killing nodes where IP is reused after previous node termination

Ashutosh Chauhan (JIRA) Tue, 21 Nov 2017 13:27:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261509#comment-16261509
 ]


Ashutosh Chauhan commented on HIVE-15102:
-----------------------------------------

In some of the recent test runs some batches are timing out (ofcourse randomly 
but rarely). I looked into log of one such failure and found it contains 
following:
{code}
2017-11-18T10:30:01,231  WARN [Fetcher_O {Map_1} #0] 
orderedgrouped.FetcherOrderedGrouped: Failed to connect to 
hive-ptest-slaves-aff.c.gcp-hive-upstream.internal:0 with 1 inputs
java.io.IOException: Failed to connect to 
http://hive-ptest-slaves-aff.c.gcp-hive-upstream.internal:0/mapOutput?job=job_1511029513075_0001&dag=203&reduce=0&map=attempt_1511029513075_0001_203_00_000000_0_11576,
 #connectionFailures=3
        at org.apache.tez.http.HttpConnection.connect(HttpConnection.java:168) 
~[tez-runtime-library-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
{code}

Above suggested to me that some slaves went away in middle of test execution 
resulting in those time outs.

> Hiveptest is killing nodes where IP is reused after previous node termination
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15102
>                 URL: https://issues.apache.org/jira/browse/HIVE-15102
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.2.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-15102.1.patch
>
>
> NO PRECOMMIT TESTS
> The Hiveptest framework has a background thread that runs every hour, and 
> attempts to kill zombie nodes that are not being used by the test execution 
> anymore. 
> These killed nodes are kept in a list of terminated nodes, and next time the 
> background thread is executed, it will attempt to kill all those nodes again 
> because Hiveptest consider them as zombie nodes.
> The problem is that cloud providers can give you the same IP numbers for new 
> nodes, and when the background thread runs, it will kill those nodes that may 
> still be in used by Hiveptest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15102) Hiveptest is killing nodes where IP is reused after previous node termination

Reply via email to