Hi Techie,
Have you decided on your HA approach by any chance? Dr Mich Talebzadeh LinkedIn <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 <http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 24 January 2016 21:57 To: user@hive.apache.org Subject: RE: Backing up hive database Hi, That is a valid question. However, in my two cents, I will move away from Hortonworks and others and consider a solution that can provide High availability (HA) (not to be confused with continuous availability) for both the Hive metastore and Hive server2. Depending on your Service Level Agreements (SLA), the acceptable downtime can be defined. Since Hive is essentially a data warehouse it does not need to be defined as mission critical as a typical transactional system. First by definition the HA components have to be on independent host(s). Since Hive server2 connects to its metastore via JDBC then it is simply an open client connection. Our Financial clients have chosen to either put their Hive database on Oracle or SAP ASE simply because they already have global licenses for these two products and besides they have trained DBAs who regularly monitor and maintain their databases. Now both Oracle and SAP provide replication technologies (Golden Gate and SAP Replication server for example) that can do database/table level replication via Multi-site Availability (MDA). Hive database is not particularly complicated so it is really a minimum effort to replicate. Ok this is all about software replication as opposed to the standard HA set up using a redundant set of hardware similar to what is set up in DR sites. With regard to Hive server2 itself, Zookeeper can be used much like Open Switch where multiple Hive server2 instances can register themselves with zookeeper that provides what is known as “dynamic service discovery”. For example JDBC clients connect to Zookeeper which returns a randomly selected registered Hive server2 instance by returning <host>:<port> for a registered Hive server2 instance. The client uses the returned value to connect to a particular Hive server2 directly to perform its work. However, this does not provide an automatic fail over to another Hive server2. If that Hive server2 goes down, the Hive clients will have to reconnect again to Zookeeper to be directed to a live Hive Server2 node in the collection. HTH Dr Mich Talebzadeh LinkedIn <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 <http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Greenhorn Techie [mailto:greenhorntec...@gmail.com] Sent: 24 January 2016 20:49 To: user@hive.apache.org <mailto:user@hive.apache.org> Subject: Backing up hive database Hi, I am trying to setup Hortonworks Data Platform. I would want to setup Hive in high availability mode (both metastore and as well as HiveServer2). Along with that, Hortonworks recommendation is to backup the RDBMS behind Hive service. Can anyone please let me know what is the best practice around this? As, by default, Hive uses MySQL, please advise me on the best way to achieve high availability for all the Hive components. Many Thanks