Provide an automatic recovery feature for Hive Server in case of failure
------------------------------------------------------------------------

                 Key: HIVE-2254
                 URL: https://issues.apache.org/jira/browse/HIVE-2254
             Project: Hive
          Issue Type: New Feature
          Components: Clients, Query Processor, Server Infrastructure
    Affects Versions: 0.7.1, 0.5.0
         Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 
10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
            Reporter: Chinna Rao Lalam
            Assignee: Chinna Rao Lalam


*Motivation*
We are doing log analysis using Hive by submitting queries through Hive Server 
and we have provided Name Node HA and Job tracker HA to achieve the high 
availability but Currently Hive Server is a single point of failure. If the 
machine running Hive Server is down or broken, Hive service cannot be availed 
till someone notice the Hive Sever failure and bring it up till this time our 
log analysis is not continuing. To avoid this problem we need an automatic 
system that can detect the failure and make sure of the high availability of 
the Server.

*Proposal*
Deploy two Hive Servers. One of the Hive Server will act as active while the 
other one will be a Hot Standby. Here we need a system to decide which can be 
active and which can be standby and a failure detection mechanism it should 
detect if Active server is down or broken and trigger the switch over (standby 
to active). This failure detection mechanism will be based on Zookeeper (HA 
Agent).

The clients of Hive Server should be configured with the address of both 
servers. While getting the connection it will detect the Active Hive Server & 
connect to it.

While executing query Hive Server is down after starting Hive Server need to 
submit the query again but already executed query will run in the background. 
Continuing this query execution is no use so it is wastage of cluster resource. 
In this solution once active is down standby will become active to server and 
it will ensure to stop the already executed query execution (Hive tasks & 
MapRed jobs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to