Re: Best practice for automating jobs

Tom Brown Thu, 10 Jan 2013 18:55:36 -0800

How is concurrency achieved with this solution?

On Thursday, January 10, 2013, Qiang Wang wrote:


> I believe the HWI (Hive Web Interface) can give you a hand.
>
> https://github.com/anjuke/hwi
>
> You can use the HWI to submit and run queries concurrently.
> Partition management can be achieved by creating crontabs using the HWI.
>
> It's simple and easy to use. Hope it helps.
>
> Regards,
> Qiang
>
>
> 2013/1/11 Tom Brown <tombrow...@gmail.com <javascript:_e({}, 'cvml',
> 'tombrow...@gmail.com');>>
>
>> All,
>>
>> I want to automate jobs against Hive (using an external table with
>> ever growing partitions), and I'm running into a few challenges:
>>
>> Concurrency - If I run Hive as a thrift server, I can only safely run
>> one job at a time. As such, it seems like my best bet will be to run
>> it from the command line and setup a brand new instance for each job.
>> That quite a bit of a hassle to solves a seemingly common problem, so
>> I want to know if there are any accepted patterns or best practices
>> for this?
>>
>> Partition management - New partitions will be added regularly. If I
>> have to setup multiple instances of Hive for each (potentially)
>> overlapping job, it will be difficult to keep track of the partitions
>> that have been added. In the context of the preceding question, what
>> is the best way to add metadata about new partitions?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: Best practice for automating jobs

Reply via email to