Re: How to Fan Out to 100s of Sinks

Fabian Paul Tue, 30 Nov 2021 04:20:05 -0800

Hi Shree,

I think for every Iceberg Table you have to instantiate a different
sink in your program. You basically have one operator before your
sinks that decides where to route the records. You probably end up
with one Iceberg sink for each of your customers. Maybe you can take a
look at the DemultiplixingSink [1] but unfortunately, there has not
been much progress yet.


Best,
Fabian

[1] https://issues.apache.org/jira/browse/FLINK-24493

On Mon, Nov 29, 2021 at 7:11 PM SHREEKANT ANKALA <ask...@hotmail.com> wrote:
>
> Hi,
>     Here is our scenario:
>
>     We have a system that generates data in a jsonl file for all of customers 
> together. We now need to process this jsonl data and conditionally distribute 
> the data to individual customer based on their preferences as Iceberg Tables. 
> So every line in the jsonl file, the data will end up one of the customers S3 
> bucket as an Iceberg table row. We were hoping to continue using Flink for 
> this use case by just one job doing a conditional sink, but we are not sure 
> if that would be the right usage of Flink.
>
> Thanks,
> Shree
> ________________________________
> From: Fabian Paul <fp...@apache.org>
> Sent: Monday, November 29, 2021 1:57 AM
> To: SHREEKANT ANKALA <ask...@hotmail.com>
> Cc: user@flink.apache.org <user@flink.apache.org>
> Subject: Re: How to Fan Out to 100s of Sinks
>
> Hi,
>
> What do you mean by "fan out" to 100 different sinks? Do you want to
> replicate the data in all buckets or is there some conditional
> branching logic?
>
> In general, Flink can easily support 100 different sinks but I am not
> sure if this is the right approach for your use case. Can you clarify
> your motivation and tell us a bit more about the exact scenario?
>
> Best,
> Fabian
>
>
>
> On Mon, Nov 29, 2021 at 1:11 AM SHREEKANT ANKALA <ask...@hotmail.com> wrote:
> >
> > Hi all, we current have a Flink job that retrieves jsonl data from GCS and 
> > writes to Iceberg Tables. We are using Flink 13.2 and things are working 
> > fine.
> >
> > We now have to fan out that same data in to 100 different sinks - Iceberg 
> > Tables on s3. There will be 100 buckets and the data needs to be sent to 
> > each of these 100 different buckets.
> >
> > We are planning to add a new Job that will write to 1 sink at a time for 
> > each time it is launched. Is there any other optimal approach possible in 
> > Flink to support this use case of 100 different sinks?

Re: How to Fan Out to 100s of Sinks

Reply via email to