Ok, I don't see it as a good case for Stateful Functions. Statefun is a 
fantastic tool when you treat each function as an object with its data and 
behavior, and these objects communicate with each other in arbitrary 
directions. Also, Stateful Functions remote API is asynchronous, so you can use 
it to enrich some data.

While I have a limited understanding of your problem, it looks like a classic 
DAG that can be implemented using pure Flink DataStream API.

For instance:
1. Kafka source ->
2. keyBy your id ->
3. FlatMap to split Kafka messages to message per id and also attach the total 
count of messages to each downstream message ->
4. RichAsyncMessage to enrich id and count with the data from Cassandra ->
5. process with a function that:
        5.1. Sends incoming messages downstream and increases the count of seen 
messages
        5.2. Compares the count of seen messages to the total count. If they 
equal sends ack to the side output and clears the state
        5.3. If there are some more messages expected and there were nothing 
during some threshold duration, sends an error to another side output
6. Three sinks — one for enriched outgoing messages, the other one for acks 
from side output, and the last one for errors.

Or you can even combine acks and errors to the same sink.

Works for you?

Best,
Tymur Yarosh
16 серп. 2022 р., 13:40 +0300, Himanshu Sareen <himanshusar...@outlook.com>, 
писав(-ла):
> Hi Tymur,
>
> 1. Why do you want a remote function to call an embedded function in this 
> case?
>
>     To Implement an Async IO call. 
> https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/sdk/flink-datastream/#completing-async-requests
>
> 2. Do you have separate outgoing Kafka messages or join them per record?
>
>     1 DB query will fetch Millions records which will be aggregated into 
> multiple separate events/messages to be published to Kafka.
>
> 3. What is explicit acknowledgment at the end for?
>
>      As 1 Ingress event may have > 500 IDs. So once entire events has been 
> processed ( i.e. DB query is completed for all 500 IDs ) we have to send a 
> Trigger Event for Down-Stream consumers.
>
>
> Also If there is better way to code this problem do share some pointers.
>
> Regards,
> Himanshu
> From: Tymur Yarosh <ti.yar...@gmail.com>
> Sent: Tuesday, August 16, 2022 2:37 PM
> To: user@flink.apache.org <user@flink.apache.org>; Himanshu Sareen 
> <himanshusar...@outlook.com>
> Subject: Re: Flink-Statefun : Design Solution advice ( ASYNC IO)
>
> Hi,
> Just a few questions:
> 1. Why do you want a remote function to call an embedded function in this 
> case?
> 2. Do you have separate outgoing Kafka messages or join them per record?
> 3. What is explicit acknowledgment at the end for?
>
> Best,
> Tymur Yarosh
> 16 ÓÅÒÐ. 2022 Ò., 11:48 +0300, Himanshu Sareen <himanshusar...@outlook.com>, 
> ÐÉÓÁ×(-ÌÁ):
> > Team,
> >
> > I'm solving a use-case and needs advice/suggestions if Flink is right 
> > choice for solution.
> >
> > 1. Ingress - Kafka events/messages consist of multiple IDs.
> > 2. Task - For each ID in a Kafka message query Cassandra DB ( 
> > asynchronously) to fetch records. Prepare multiple small messages out of 
> > the records received.
> > 3. Egress - Emit all the small messages/event to Kafka.
> >
> > Ingress Event may contain more than 500 IDs
> > Each ID may result into Millions of Records from Cassandra DB.
> >
> >
> > My Design Approach
> >
> > 1. Remote Stateful Function listens to Ingress.
> > 2. Invokes Embedded function ( which implements ASYNC IO ) for each ID.
> > 3. Once records are available from Embedded Function, process them and 
> > emits multiple events to Kafka.
> > 4. Send back an acknowledgement to calling remote Function.
> >
> > Please share suggestions or advice.
> >
> > Note - I'm unable to find a good example for embedded and remote Function 
> > interaction.
> >
> > Regards

Reply via email to