RE: periodic execution

Balaji Rajagopalan Thu, 10 Feb 2011 00:35:35 -0800

Alejandro,
   I have used your hive action patch from tucu's forked branch in yahoo github 
and it works fine, when will your patch be  available in the master branch of 
yahoo github.  Also I have a small suggestion if I may, hive-default.xml is 
bundled with the oozie-core.jar, instead can we have the hive-default.xml is 
the same folder of workflow.xml in the hdfs, so when I change the 
hive-default.xml I don't have to bundle the jar again.


Regards,
Balaji

From: Alejandro Abdelnur [mailto:t...@cloudera.com]
Sent: Thursday, February 10, 2011 3:12 AM
To: user@hive.apache.org
Subject: Re: periodic execution

Hi Cam,

A bit of information that may be useful for you, Cloudera's Oozie has a Hive 
action that you can use from workflow jobs.

Cheers

Alejandro

On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz 
<camb...@gmail.com<mailto:camb...@gmail.com>> wrote:
Hello,

I am looking over oozie's coordinator. But meanwhile, I managed to
write a simple java program to connect to hive using jdbc.

I can import data and execute queries.

I was wondering, somewhat for doing workflows, one needs to keep
metadata, i.e. which was the last file, partition processed etc.

I could do this usually using a database like db4o, and keeping a static file.

Is the derby database that comes with hive is for this purpose? how do
people usually store state when using a hive application?

best regards,
-C.B.

On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher 
<ham...@cloudera.com<mailto:ham...@cloudera.com>> wrote:
> Hey Cam,
> You should use Oozie's
> Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
> Regards,
> Jeff
>
> On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz 
> <camb...@gmail.com<mailto:camb...@gmail.com>> wrote:
>>
>> Hello,
>>
>> What kind of strategy must i follow, in order to periodically run
>> certain things.
>>
>> For example, each hour, i want to look up log files from certain dir,
>> and for new files, i need to run:
>>
>> load data local inpath '/home/cam/logs/log.2011310120' into table
>> item_view_raw partition (date_hour=2011310120);
>>
>> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
>> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
>> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
>> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
>> ivr.date_hour='2011310120';
>>
>> obviously, i need to deduce which files are new, iterate over them,
>> and extract the time key, which will be used as a partition name, in
>> this case is: 2011310120
>>
>> It seems like i can write a java program to deal with the
>> syncronization of all these tasks, but i was wondering, what would you
>> guys suggest?
>>
>> Any ideas/recomendations/help greatly appreciated
>>
>> Best Regards,
>> C.B.
>
>

RE: periodic execution

Reply via email to