Try Azkaban - We use it here @ngmoco to run MR Jobs (not Hive Queries) and its 
pretty good - http://sna-projects.com/azkaban/

Also, it is faster learning / easy to setup. I have never worked on Oozie so I 
can't compare but you can google it.


On Feb 8, 2011, at 7:44 PM, Cam Bazz wrote:

> Hello,
> 
> I am looking over oozie's coordinator. But meanwhile, I managed to
> write a simple java program to connect to hive using jdbc.
> 
> I can import data and execute queries.
> 
> I was wondering, somewhat for doing workflows, one needs to keep
> metadata, i.e. which was the last file, partition processed etc.
> 
> I could do this usually using a database like db4o, and keeping a static file.
> 
> Is the derby database that comes with hive is for this purpose? how do
> people usually store state when using a hive application?
> 
> best regards,
> -C.B.
> 
> On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <ham...@cloudera.com> wrote:
>> Hey Cam,
>> You should use Oozie's
>> Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
>> Regards,
>> Jeff
>> 
>> On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <camb...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> What kind of strategy must i follow, in order to periodically run
>>> certain things.
>>> 
>>> For example, each hour, i want to look up log files from certain dir,
>>> and for new files, i need to run:
>>> 
>>> load data local inpath '/home/cam/logs/log.2011310120' into table
>>> item_view_raw partition (date_hour=2011310120);
>>> 
>>> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
>>> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
>>> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
>>> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
>>> ivr.date_hour='2011310120';
>>> 
>>> obviously, i need to deduce which files are new, iterate over them,
>>> and extract the time key, which will be used as a partition name, in
>>> this case is: 2011310120
>>> 
>>> It seems like i can write a java program to deal with the
>>> syncronization of all these tasks, but i was wondering, what would you
>>> guys suggest?
>>> 
>>> Any ideas/recomendations/help greatly appreciated
>>> 
>>> Best Regards,
>>> C.B.
>> 
>> 

Reply via email to