Re: "dynamic" bucketing sink

Christophe Jolif Mon, 26 Mar 2018 08:42:13 -0700

Thanks Timo & Ashish for your input.

I will definitely have a look at Kite SDK (was not aware of it). Otherwise
I'll try to prototype something and share it with the community through a
JIRA issue.


--
Christophe

On Mon, Mar 26, 2018 at 1:34 PM, ashish pok <ashish...@yahoo.com> wrote:

> Hi Christophe,
>
> Have you looked at Kite SDK? We do something like this but using Gobblin
> and Kite SDK, which is a parallel pipeline to Flink. It feels like if you
> partition by something logical like topic name, you should be able to sink
> using Kite SDK. Kite allows you good ways to handle further partitoning
> like using timestamp and also schema evolution if you are using AVRO.
>
> -- Ashish
>
> On Mon, Mar 26, 2018 at 4:57 AM, Timo Walther
> <twal...@apache.org> wrote:
> Hi Christophe,
>
> I think this will require more effort. As far as I know there is no such
> "dynamic" feature. Have you looked in to the bucketing sink code? Maybe you
> can adapt it to your needs?
>
> Otherwise it might also make sense to open an issue for it to discuss a
> design for it. Maybe other contributors are interested in this feature as
> well.
>
> Regards,
> Timo
>
> Am 23.03.18 um 18:20 schrieb Christophe Jolif:
>
> Hi all,
>
> I'm using the nice topic pattern feature on the KafkaConsumer to read from
> multiple topics, automatically discovering new topics added into the system.
>
> At the end of the processing I'm sinking the result into a Hadoop
> Filesystem using a BucketingSink.
>
> All works great until I get the requirement to sink into a different
> Hadoop Filesystem based on the input topic.
>
> One way to do this would obviously be to get rid of the topic pattern and
> start a (similar) job per topic which would each get its own sink to its
> own filesystem. And start new jobs when new topics are added. But that's
> far from being ideal. This would lead to the usual issues with Flink and a
> dynamic number of jobs (requiring new task slots...) also obviously it
> would require some external machinery to know new topics have been added
> and create new jobs etc...
>
> What would be the recommended way to have a "dynamic" BucketingSink that
> can not only write to several basePath (not too hard I guess) but also
> dynamically add new base path when new topics are coming into the system.
>
> Thanks,
> --
> Christophe
>
>
>

Re: "dynamic" bucketing sink

Reply via email to