Sounds like one way replication of metastore. Depending on your metastore 
platform that could be achieved pretty easily. 

 

Mine is Oracle and I use Materialised View replication which is pretty good but 
no latest technology. Others would be GoldenGate or SAP replication server.

 

HTH,

 

Mich

 

From: Mich Talebzadeh [mailto:m...@peridale.co.uk] 
Sent: 17 December 2015 16:47
To: user@hive.apache.org
Subject: RE: Synchronizing Hive metastores across clusters

 

Are both clusters in active/active mode or the cloud based cluster is standby?

 

From: Elliot West [mailto:tea...@gmail.com] 
Sent: 17 December 2015 16:21
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Synchronizing Hive metastores across clusters

 

Hello,

 

I'm thinking about the steps required to repeatedly push Hive datasets out from 
a traditional Hadoop cluster into a parallel cloud based cluster. This is not a 
one off, it needs to be a constantly running sync process. As new tables and 
partitions are added in one cluster, they need to be synced to the cloud 
cluster. Assuming for a moment that I have the HDFS data syncing working, I'm 
wondering what steps I need to take to reliably ship the HCatalog metadata 
across. I use HCatalog as the point of truth as to when when data is available 
and where it is located and so I think that metadata is a critical element to 
replicate in the cloud based cluster.

 

Does anyone have any recommendations on how to achieve this in practice? One 
issue (of many I suspect) is that Hive appears to store table/partition 
locations internally with absolute, fully qualified URLs, therefore unless the 
target cloud cluster is similarly named and configured some path transformation 
step will be needed as part of the synchronisation process.

 

I'd appreciate any suggestions, thoughts, or experiences related to this.

 

Cheers - Elliot.

 

 

Reply via email to