Thanks Robert for the reply. On Fri 29 May, 2020, 12:31 Robert Metzger, <rmetz...@apache.org> wrote:
> Hey Prasanna, > > (Side note: there is not need to send this email to multiple mailing > lists. The user@ list is the right one) > > Let me quickly go through your questions: > > Is this usecase suited for flink ? > > > Based on the details you've provided: Yes > What you also need to consider are the hardware requirements you'll have > for processing such amounts of data. I can strongly recommend setting up a > small demo environment to measure the throughput of a smaller Flink cluster > (say 10 machines). > > 1) If you do not have any consistency guarantees (data loss is > acceptable), and you have good infrastructure in place to deploy and > monitor such microservices then a microservice might also be an option. > Flink is pretty well suited for heavy IO use-cases. Afaik Netflix has > talked at several Flink Forward conferences about similar cases (check > Youtube for recorded talks) > > 2) It should not be too difficult to build a small, generic framework on > top of the Flink APIs > > 3) If you are deploying Flink on a resource manager like Kubernetes or > YARN, they will take care of recovering your cluster if it goes down. Your > recovery time will mostly depend on the state size that you are > checkpointing (and the ability of your resource manager to bring up new > resources). I don't think you'll be able to recover in < 500 milliseconds, > but within a few seconds. > I don't think that the other frameworks you are looking at are going to be > much better at this. > > Best, > Robert > > On Tue, May 19, 2020 at 1:28 PM Prasanna kumar < > prasannakumarram...@gmail.com> wrote: > >> Hi, >> >> I have the following usecase to implement in my organization. >> >> Say there is huge relational database(1000 tables for each of our 30k >> customers) in our monolith setup >> >> We want to reduce the load on the DB and prevent the applications from >> hitting it for latest events. So an extract is done from redo logs on to >> kafka. >> >> We need to set up a streaming platform based on the table updates that >> happen(read from kafka) , we need to form events and send it consumer. >> >> Each consumer may be interested in same table but different >> updates/columns respective of their business needs and then deliver it to >> their endpoint/kinesis/SQS/a kafka topic. >> >> So the case here is *1* table update : *m* events : *n* sink. >> Peak Load expected is easily a 100k-million table updates per second(all >> customers put together) >> Latency expected by most customers is less than a second. Mostly in >> 100-500ms. >> >> Is this usecase suited for flink ? >> >> I went through the Flink book and documentation. These are the following >> questions i have >> >> 1). If we have situation like this *1* table update : *m* events : *n* >> sink , is it better to write our micro service on our own or it it better >> to implement through flink. >> 1 a) How does checkpointing happens if we have *1* input: *n* >> output situations. >> 1 b) There are no heavy transformations maximum we might do is to >> check the required columns are present in the db updates and decide whether >> to create an event. So there is an alternative thought process to write a >> service in node since it more IO and less process. >> >> 2) I see that we are writing a Job and it is deployed and flink takes >> care of the rest in handling parallelism, latency and throughput. >> But what i need is to write a generic framework so that we should be >> able to handle any table structure. we should not end up writing one job >> driver for each case. >> There are at least 200 type of events in the existing monolith system >> which might move to this new system once built. >> >> 3) How do we maintain flink cluster HA . From the book , i get that >> internal task level failures are handled gracefully in flink. But what if >> the flink cluster goes down, how do we make sure its HA ? >> I had earlier worked with spark and we had issues managing it. (Not >> much problem was there since there the latency requirement is 15 min and we >> could make sure to ramp another one up within that time). >> These are absolute realtime cases and we cannot miss even one >> message/event. >> >> There are also thoughts whether to use kafka streams/apache storm for the >> same. [They are investigated by different set of folks] >> >> Thanks, >> Prasanna. >> >