Re: How to load hive metadata from conf dir

Parag Sarda Tue, 12 Feb 2013 08:37:33 -0800

Look like I am not doing good job in explaining my requirements.

My program is like a workflow engine which reads a script/configuration file 
and only after reading a configuration file, it will know which metadata to 
read from hive. E.g. Here is simplified version of script file


 == Example Input script ==
input: type=hive, db=test, table=sample, partitions=*
output: type=hive, db=test2 table=sample2, partitions=*
program: type=exec, command=run.sh
  == END ==

Now after reading this script file, my program would like to look for all 
partitions information from test.sample table in hive.


--
Parag

From: Nitin Pawar <nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Tuesday, 12 February 2013 1:55 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: How to load hive metadata from conf dir

In our case we needed  to access hive meta data inside our oozie workflows

we were using Hcatalog as our hive metadata store and it was easy to access 
table meta data directly via Hcatalog apis )

parag, will it be possible for you guys to change your metadata store ?
if not then you will need to write a node in your workflow which gets the 
meta-data and stores it for your next nodes in workflow as  Mark said


On Tue, Feb 12, 2013 at 1:48 PM, Parag Sarda 
<psa...@walmartlabs.com<mailto:psa...@walmartlabs.com>> wrote:
Thanks Mark for your reply.

My program is like a workflow management application and it runs on client 
machine and not on hadoop cluster. I use 'hadoop jar' so that my application 
has access to DFS and hadoop API. I would also like my application to have 
access to Hive metadata the same way it has access to DFS. Users can then write 
the rules for their workflow against hive metadata.

Since users for my application are already using Hive, I need to support hive 
metadata and I can not ask them to move to Hcatlog.

Thanks again,
Parag

From: Mark Grover 
<grover.markgro...@gmail.com<mailto:grover.markgro...@gmail.com><mailto:grover.markgro...@gmail.com<mailto:grover.markgro...@gmail.com>>>
Reply-To: 
"user@hive.apache.org<mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>"
 
<user@hive.apache.org<mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>>
Date: Tuesday, 12 February 2013 10:27 AM
To: 
"user@hive.apache.org<mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>"
 
<user@hive.apache.org<mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>>
Subject: Re: How to load hive metadata from conf dir

Hi Parag,
I think your question boils down to:

How does one access Hive metadata from MapReduce jobs?

In the past, when I've had to write MR jobs and needed Hive metadata, I ended 
up writing a wrapper Hive query that used a custom mapper and reducer by using 
hive's transform functionality to do the job.

However, if you want to stick to MR job, you seem to be along the right lines.

Also, it seems that HCatalog's 
(http://incubator.apache.org/hcatalog/docs/r0.4.0/) premise is to make metadata 
access among Hive, Pig and MR easier. Perhaps, you want to take a look at that 
and see if that fits your use case?

Mark

On Mon, Feb 11, 2013 at 2:59 PM, Parag Sarda 
<psa...@walmartlabs.com<mailto:psa...@walmartlabs.com><mailto:psa...@walmartlabs.com<mailto:psa...@walmartlabs.com>>>
 wrote:
Hello Hive Users,

I am writing a program in java which is bundled as JAR and executed using
hadoop jar command. I would like to access hive metadata (read partitions
informations) in this program. I can ask user to set HIVE_CONF_DIR
environment variable before calling my program or ask for any reasonable
parameters to be passed. I do not want to force user to run hive megastore
service if possible to increase reliability of program by avoiding
external dependencies.

What is the recommended way to get partitions information? Here is my
understanding
1. Make sure my jar is bundled with hive-metastore[1] library.
2. Use HiveMetastoreClient[2]

Is this correct? If yes, how to read the hive configuration[3] from
HIVE_CONF_DIR?

[1] http://mvnrepository.com/artifact/org.apache.hive/hive-metastore
[2]
http://hive.apache.org/docs/r0.7.1/api/org/apache/hadoop/hive/metastore/Hiv
eMetaStoreClient.html
[3]
http://hive.apache.org/docs/r0.7.1/api/org/apache/hadoop/hive/conf/HiveConf
.html

Thanks in advance,
Parag





--
Nitin Pawar

Re: How to load hive metadata from conf dir

Reply via email to