Hello, I am looking over oozie's coordinator. But meanwhile, I managed to write a simple java program to connect to hive using jdbc.
I can import data and execute queries. I was wondering, somewhat for doing workflows, one needs to keep metadata, i.e. which was the last file, partition processed etc. I could do this usually using a database like db4o, and keeping a static file. Is the derby database that comes with hive is for this purpose? how do people usually store state when using a hive application? best regards, -C.B. On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <ham...@cloudera.com> wrote: > Hey Cam, > You should use Oozie's > Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases. > Regards, > Jeff > > On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <camb...@gmail.com> wrote: >> >> Hello, >> >> What kind of strategy must i follow, in order to periodically run >> certain things. >> >> For example, each hour, i want to look up log files from certain dir, >> and for new files, i need to run: >> >> load data local inpath '/home/cam/logs/log.2011310120' into table >> item_view_raw partition (date_hour=2011310120); >> >> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition >> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number, >> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status, >> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and >> ivr.date_hour='2011310120'; >> >> obviously, i need to deduce which files are new, iterate over them, >> and extract the time key, which will be used as a partition name, in >> this case is: 2011310120 >> >> It seems like i can write a java program to deal with the >> syncronization of all these tasks, but i was wondering, what would you >> guys suggest? >> >> Any ideas/recomendations/help greatly appreciated >> >> Best Regards, >> C.B. > >