What about using a workflow tool like Oozie, Azkaban, or Amazon Data
Pipeline? Set them to be triggered as soon as the s3 bucket is available
and execute the ALTER TABLE command.

On Thursday, July 31, 2014, Viral Bajaria <viral.baja...@gmail.com> wrote:

> Any suggestions on this ? Still trying to figure out how do I get a
> notification that a new partition is being created by the HDFS sink and I
> can add that via a ALTER TABLE statement on a separate thread.
>
> Is adding a callback the right way to handle this ?
>
> Thanks,
> Viral
>
>
>
> On Mon, Jul 28, 2014 at 2:40 PM, Viral Bajaria <viral.baja...@gmail.com
> <javascript:_e(%7B%7D,'cvml','viral.baja...@gmail.com');>> wrote:
>
>> Hi,
>>
>> Is there a way to get the hdfs sink to signal that a file was just closed
>> and then use that signal to add a partition to hive if one does not exist
>> already.
>>
>> Right now, what I do is:
>>
>> - move files to s3
>> - run recover partitions <--- step takes forever.
>>
>> But given that I have so much historical data, it's not feasible to run
>> recover partitions every single day since it takes forever.
>>
>> I had much rather add an extra partition whenever I see a file in that
>> partition for the first time.
>>
>> I looked around the code base and it seems the Flume-OG had something
>> like this but I don't see the capability in Flume-NG.
>>
>> I can see a way to adding this by adding another Callback parameter to
>> the HdfsEventSink and create a customer wrapper around it.
>>
>> Any other suggestions ?
>>
>> Thanks,
>> Viral
>>
>>
>

Reply via email to