If you know make and bash, have a look at Stampede for scheduling work: https://github.com/ThinkBigAnalytics/stampede
(Full disclosure: I wrote it) On Thu, Jan 10, 2013 at 4:11 PM, Sean McNamara <sean.mcnam...@webtrends.com>wrote: > > I want to know if there are any accepted patterns or best practices for > >this? > > http://oozie.apache.org/ > > > > With both Stampede and Oozie, you can tell them to watch for certain data to show up, e.g., a _SUCCESS file marker in a directory getting new data files, and then start a Hive query, etc. You can also add your partition creation commands in the workflow, e.g., as soon as the data is present (or even before; Hive won't care if it doesn't exist yet). > > New partitions will be added regularly > > When you add a partition, that metadata goes into the metastore, so every hive instance sharing that metastore will see it. Of course, you should avoid scenarios where multiple processes attempt to create the same partition, although if they are using exactly the same command, then adding an IF NOT EXISTS clause will avoid error messages. Still, I wouldn't want to torture test the metastore... > What type of partitions are you adding? Why frequently? > > > > > Sean > > > On 1/10/13 3:03 PM, "Tom Brown" <tombrow...@gmail.com> wrote: > > >All, > > > >I want to automate jobs against Hive (using an external table with > >ever growing partitions), and I'm running into a few challenges: > > > >Concurrency - If I run Hive as a thrift server, I can only safely run > >one job at a time. As such, it seems like my best bet will be to run > >it from the command line and setup a brand new instance for each job. > >That quite a bit of a hassle to solves a seemingly common problem, so > >I want to know if there are any accepted patterns or best practices > >for this? > > > >Partition management - New partitions will be added regularly. If I > >have to setup multiple instances of Hive for each (potentially) > >overlapping job, it will be difficult to keep track of the partitions > >that have been added. In the context of the preceding question, what > >is the best way to add metadata about new partitions? > > > >Thanks in advance! > > > >--Tom > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330