Re: How to Fan Out to 100s of Sinks

SHREEKANT ANKALA Mon, 29 Nov 2021 10:11:38 -0800

Hi,
    Here is our scenario:

    We have a system that generates data in a jsonl file for all of customers 
together. We now need to process this jsonl data and conditionally distribute 
the data to individual customer based on their preferences as Iceberg Tables. 
So every line in the jsonl file, the data will end up one of the customers S3 
bucket as an Iceberg table row. We were hoping to continue using Flink for this 
use case by just one job doing a conditional sink, but we are not sure if that 
would be the right usage of Flink.


Thanks,
Shree
________________________________
From: Fabian Paul <fp...@apache.org>
Sent: Monday, November 29, 2021 1:57 AM
To: SHREEKANT ANKALA <ask...@hotmail.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: Re: How to Fan Out to 100s of Sinks

Hi,

What do you mean by "fan out" to 100 different sinks? Do you want to
replicate the data in all buckets or is there some conditional
branching logic?

In general, Flink can easily support 100 different sinks but I am not
sure if this is the right approach for your use case. Can you clarify
your motivation and tell us a bit more about the exact scenario?

Best,
Fabian



On Mon, Nov 29, 2021 at 1:11 AM SHREEKANT ANKALA <ask...@hotmail.com> wrote:
>
> Hi all, we current have a Flink job that retrieves jsonl data from GCS and 
> writes to Iceberg Tables. We are using Flink 13.2 and things are working fine.
>
> We now have to fan out that same data in to 100 different sinks - Iceberg 
> Tables on s3. There will be 100 buckets and the data needs to be sent to each 
> of these 100 different buckets.
>
> We are planning to add a new Job that will write to 1 sink at a time for each 
> time it is launched. Is there any other optimal approach possible in Flink to 
> support this use case of 100 different sinks?

Re: How to Fan Out to 100s of Sinks

Reply via email to