I was planning to use the notify/callback to add an entry in some external queue system or trigger a workflow. Without knowing what partition just got written to is very tough.
Plus not everything is always written to a S3 bucket. Some files are actually written to HDFS too. How will a workflow tool help with this ? Any ideas / pointers on that ? On Mon, Aug 4, 2014 at 10:23 PM, Andrew Ehrlich <ehrlic...@gmail.com> wrote: > What about using a workflow tool like Oozie, Azkaban, or Amazon Data > Pipeline? Set them to be triggered as soon as the s3 bucket is available > and execute the ALTER TABLE command. > > > On Thursday, July 31, 2014, Viral Bajaria <viral.baja...@gmail.com> wrote: > >> Any suggestions on this ? Still trying to figure out how do I get a >> notification that a new partition is being created by the HDFS sink and I >> can add that via a ALTER TABLE statement on a separate thread. >> >> Is adding a callback the right way to handle this ? >> >> Thanks, >> Viral >> >> >> >> On Mon, Jul 28, 2014 at 2:40 PM, Viral Bajaria <viral.baja...@gmail.com> >> wrote: >> >>> Hi, >>> >>> Is there a way to get the hdfs sink to signal that a file was just >>> closed and then use that signal to add a partition to hive if one does not >>> exist already. >>> >>> Right now, what I do is: >>> >>> - move files to s3 >>> - run recover partitions <--- step takes forever. >>> >>> But given that I have so much historical data, it's not feasible to run >>> recover partitions every single day since it takes forever. >>> >>> I had much rather add an extra partition whenever I see a file in that >>> partition for the first time. >>> >>> I looked around the code base and it seems the Flume-OG had something >>> like this but I don't see the capability in Flume-NG. >>> >>> I can see a way to adding this by adding another Callback parameter to >>> the HdfsEventSink and create a customer wrapper around it. >>> >>> Any other suggestions ? >>> >>> Thanks, >>> Viral >>> >>> >>