I was planning to use the notify/callback to add an entry in some external
queue system or trigger a workflow. Without knowing what partition just got
written to is very tough.

Plus not everything is always written to a S3 bucket. Some files are
actually written to HDFS too.

How will a workflow tool help with this ? Any ideas / pointers on that ?





On Mon, Aug 4, 2014 at 10:23 PM, Andrew Ehrlich <ehrlic...@gmail.com> wrote:

> What about using a workflow tool like Oozie, Azkaban, or Amazon Data
> Pipeline? Set them to be triggered as soon as the s3 bucket is available
> and execute the ALTER TABLE command.
>
>
> On Thursday, July 31, 2014, Viral Bajaria <viral.baja...@gmail.com> wrote:
>
>> Any suggestions on this ? Still trying to figure out how do I get a
>> notification that a new partition is being created by the HDFS sink and I
>> can add that via a ALTER TABLE statement on a separate thread.
>>
>> Is adding a callback the right way to handle this ?
>>
>> Thanks,
>> Viral
>>
>>
>>
>> On Mon, Jul 28, 2014 at 2:40 PM, Viral Bajaria <viral.baja...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is there a way to get the hdfs sink to signal that a file was just
>>> closed and then use that signal to add a partition to hive if one does not
>>> exist already.
>>>
>>> Right now, what I do is:
>>>
>>> - move files to s3
>>> - run recover partitions <--- step takes forever.
>>>
>>> But given that I have so much historical data, it's not feasible to run
>>> recover partitions every single day since it takes forever.
>>>
>>> I had much rather add an extra partition whenever I see a file in that
>>> partition for the first time.
>>>
>>> I looked around the code base and it seems the Flume-OG had something
>>> like this but I don't see the capability in Flume-NG.
>>>
>>> I can see a way to adding this by adding another Callback parameter to
>>> the HdfsEventSink and create a customer wrapper around it.
>>>
>>> Any other suggestions ?
>>>
>>> Thanks,
>>> Viral
>>>
>>>
>>

Reply via email to