Re: Need Help on Flink suitability to our usecase

Prasanna kumar Fri, 29 May 2020 00:14:50 -0700

Thanks Robert for the reply.

On Fri 29 May, 2020, 12:31 Robert Metzger, <rmetz...@apache.org> wrote:


> Hey Prasanna,
>
> (Side note: there is not need to send this email to multiple mailing
> lists. The user@ list is the right one)
>
> Let me quickly go through your questions:
>
> Is this usecase suited for flink ?
>
>
> Based on the details you've provided: Yes
> What you also need to consider are the hardware requirements you'll have
> for processing such amounts of data. I can strongly recommend setting up a
> small demo environment to measure the throughput of a smaller Flink cluster
> (say 10 machines).
>
> 1) If you do not have any consistency guarantees (data loss is
> acceptable), and you have good infrastructure in place to deploy and
> monitor such microservices then a microservice might also be an option.
> Flink is pretty well suited for heavy IO use-cases. Afaik Netflix has
> talked at several Flink Forward conferences about similar cases (check
> Youtube for recorded talks)
>
> 2) It should not be too difficult to build a small, generic framework on
> top of the Flink APIs
>
> 3) If you are deploying Flink on a resource manager like Kubernetes or
> YARN, they will take care of recovering your cluster if it goes down. Your
> recovery time will mostly depend on the state size that you are
> checkpointing (and the ability of your resource manager to bring up new
> resources). I don't think you'll be able to recover in < 500 milliseconds,
> but within a few seconds.
> I don't think that the other frameworks you are looking at are going to be
> much better at this.
>
> Best,
> Robert
>
> On Tue, May 19, 2020 at 1:28 PM Prasanna kumar <
> prasannakumarram...@gmail.com> wrote:
>
>> Hi,
>>
>> I have the following usecase to implement in my organization.
>>
>> Say there is huge relational database(1000 tables for each of our 30k
>> customers) in our monolith setup
>>
>> We want to reduce the load on the DB and prevent the applications from
>> hitting it for latest events. So an extract is done from redo logs on to
>> kafka.
>>
>> We need to set up a streaming platform based on the table updates that
>> happen(read from kafka) , we need to form events and send it consumer.
>>
>> Each consumer may be interested in same table but different
>> updates/columns respective of their business needs and then deliver it to
>> their endpoint/kinesis/SQS/a kafka topic.
>>
>> So the case here is *1* table update : *m* events : *n* sink.
>> Peak Load expected is easily a  100k-million table updates per second(all
>> customers put together)
>> Latency expected by most customers is less than a second. Mostly in
>> 100-500ms.
>>
>> Is this usecase suited for flink ?
>>
>> I went through the Flink book and documentation. These are the following
>> questions i have
>>
>> 1). If we have situation like this *1* table update : *m* events : *n*
>> sink , is it better to write our micro service on our own or it it better
>> to implement through flink.
>>       1 a)  How does checkpointing happens if we have *1* input: *n*
>> output situations.
>>       1 b)  There are no heavy transformations maximum we might do is to
>> check the required columns are present in the db updates and decide whether
>> to create an event. So there is an alternative thought process to write a
>> service in node since it more IO and less process.
>>
>> 2)  I see that we are writing a Job and it is deployed and flink takes
>> care of the rest in handling parallelism, latency and throughput.
>>      But what i need is to write a generic framework so that we should be
>> able to handle any table structure. we should not end up writing one job
>> driver for each case.
>>     There are at least 200 type of events in the existing monolith system
>> which might move to this new system once built.
>>
>> 3)  How do we maintain flink cluster HA . From the book , i get that
>> internal task level failures are handled gracefully in flink.  But what if
>> the flink cluster goes down, how do we make sure its HA ?
>>     I had earlier worked with spark and we had issues managing it. (Not
>> much problem was there since there the latency requirement is 15 min and we
>> could make sure to ramp another one up within that time).
>>     These are absolute realtime cases and we cannot miss even one
>> message/event.
>>
>> There are also thoughts whether to use kafka streams/apache storm for the
>> same. [They are investigated by different set of folks]
>>
>> Thanks,
>> Prasanna.
>>
>

Re: Need Help on Flink suitability to our usecase

Reply via email to