Re: for each partition

2011-02-08 Thread Christopher, Pat
If you want to operate over all partitions in a table you don't need to specify the partitions at all. Run your query and enjoy! If you want to specify the partition mapping of the output dataset from a query, I think you can derive that value on a per row basis like so: Partition=substr(dat

for each partition

2011-02-08 Thread Cam Bazz
Hello, How can I do some process for each partition in some other table. for example lets say table A has partitions 1,2,3 I want to be able to say for each partition in A do { select * from A where partition is ? into some othertable where partition is ? } Best Regards, C.B.

filtering out crawlers

2011-02-08 Thread Cam Bazz
Hello, Is there a practical way to filter the logs left by crawlers like google? They usually have user-agent strings like Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) is there a database for the

Re: periodic execution

2011-02-08 Thread Cam Bazz
Hello, I am looking over oozie's coordinator. But meanwhile, I managed to write a simple java program to connect to hive using jdbc. I can import data and execute queries. I was wondering, somewhat for doing workflows, one needs to keep metadata, i.e. which was the last file, partition processed

Re: periodic execution

2011-02-08 Thread Jeff Hammerbacher
Hey Cam, You should use Oozie's Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases. Regards, Jeff On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz wrote: > Hello, > > What kind of strategy must i follow, in order to periodically run > certain things. > > For example, each hour, i w

periodic execution

2011-02-08 Thread Cam Bazz
Hello, What kind of strategy must i follow, in order to periodically run certain things. For example, each hour, i want to look up log files from certain dir, and for new files, i need to run: load data local inpath '/home/cam/logs/log.2011310120' into table item_view_raw partition (date_hour=20