Re: periodic execution

Cam Bazz Tue, 08 Feb 2011 19:45:32 -0800

Hello,

I am looking over oozie's coordinator. But meanwhile, I managed to
write a simple java program to connect to hive using jdbc.


I can import data and execute queries.

I was wondering, somewhat for doing workflows, one needs to keep
metadata, i.e. which was the last file, partition processed etc.

I could do this usually using a database like db4o, and keeping a static file.

Is the derby database that comes with hive is for this purpose? how do
people usually store state when using a hive application?

best regards,
-C.B.

On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <ham...@cloudera.com> wrote:
> Hey Cam,
> You should use Oozie's
> Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
> Regards,
> Jeff
>
> On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <camb...@gmail.com> wrote:
>>
>> Hello,
>>
>> What kind of strategy must i follow, in order to periodically run
>> certain things.
>>
>> For example, each hour, i want to look up log files from certain dir,
>> and for new files, i need to run:
>>
>> load data local inpath '/home/cam/logs/log.2011310120' into table
>> item_view_raw partition (date_hour=2011310120);
>>
>> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
>> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
>> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
>> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
>> ivr.date_hour='2011310120';
>>
>> obviously, i need to deduce which files are new, iterate over them,
>> and extract the time key, which will be used as a partition name, in
>> this case is: 2011310120
>>
>> It seems like i can write a java program to deal with the
>> syncronization of all these tasks, but i was wondering, what would you
>> guys suggest?
>>
>> Any ideas/recomendations/help greatly appreciated
>>
>> Best Regards,
>> C.B.
>
>

Re: periodic execution

Reply via email to