Mark,

We do 4), basically. We have a simple hive script that does all the "create 
external table" statements, and we run that script as step 1 of the EMR jobs we 
spin up. Then our "real" processing takes over in step 2 and beyond. We're only 
working with about 50 tables, so it's pretty manageable. A side benefit is that 
we can put this create-table script under source control to track our schema 
changes over time.

Jeff Sternberg
S&P Capital IQ
www.spcapitaliq.com

-----Original Message-----
From: Mark Grover [mailto:mgro...@oanda.com] 
Sent: Tuesday, March 06, 2012 9:54 PM
To: user@hive.apache.org
Cc: Baiju Devani; Denys Berestyuk
Subject: Amazon EMR Best Practices for Hive metastore

Hi all,
I am trying to get an idea of what people do for setting up Hive metastore when 
using Amazon EMR.

For those of you using Amazon EMR:

1) Do you have a dedicated RDS instance external to your EMR Hive+Hadoop 
cluster that you use as a persistent metastore for all your cluster 
instantiations?

2) Do you use the MySQL DB that comes pre-installed on the master node and 
export its data (on cluster tear down) to something like S3 and import it from 
S3 during cluster bring up?

3) Do you use a local installation of Hive (instead of that on EMR) so that you 
could make use of an in-house dedicated metastore while utilizing Hadoop 
cluster on EMR? (i.e. local Hive + EMR Hadoop)

4) Do you do something really simple and naive like scripting up all your 
"create external table" commands and running them every time you bring up a 
cluster?

Or, do you do something else not mentioned above?:-)

Thank you in advance for sharing!

Mark

Mark Grover, Business Intelligence Analyst OANDA Corporation 

www: oanda.com www: fxtrade.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


Reply via email to