Thank you so much for creating the ticket, Igal. We are looking forward to 
being able to use it!

And thank you for giving a little more context about how StateFun keeps a 
connection pool and tries to optimize for performance and throughput.

With that said, gRPC is an architectural choice we have made. It would be 
better to maintain project consistency, rather than opening exceptions here and 
there.

We will definitely take StateFun for a spin once we can use it with gRPC.

Cheers,

Dalmo



From: Igal Shilman <[email protected]>
Date: Wednesday, September 23, 2020 at 07:53
To: Dalmo Cirne <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: Support for gRPC in Flink StateFun 2.x

Hi Dalmo,

Thanks a lot for sharing this use case!

If I understand the requirement correctly, you are mostly concerned with 
performance. In that case I've created
an issue [1] to add a gRPC transport for StateFun, and I believe we would be 
able to implement it in the upcoming weeks.

Just a side note about the way StateFun invokes remote functions via HTTP, at 
the moment:

- StateFun keeps a connection pool, to avoid re-establishing the connection for 
each request.
- StateFun batches requests per address (key) to amortize the cost of a round 
trip, and state shipment.

There is an RC2 for the upcoming StateFun version, with some improvements 
around HTTP functions,
and operational visibility (logs and metrics). So perhaps you can take that for 
a spin if you are evaluating StateFun
at the moment. The release itself is expected to happen at the end of this week.


[1] 
https://issues.apache.org/jira/browse/FLINK-19380<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D19380&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=FeoftdI25c24WDfCzZuLKlzDGX4Ny1UkpP-nYieLwI4&m=WPm1x9I_IYHIRR63tuz0IEyHnI3VvReIFlMuj3N-vjI&s=LSxNiIqEwI-LU4y9vksyU-PM3rTi-M9f5ATK6d4tHIo&e=>

Thanks,
Igal.


On Tue, Sep 22, 2020 at 4:38 AM Dalmo Cirne 
<[email protected]<mailto:[email protected]>> wrote:
Thank you for the quick reply, Igal.

Our use case is the following: A stream of data from Kafka is fed into Flink 
where data transformations take place. After that we send that transformed data 
to an inference engine to score the relevance of each record. (Rough 
simplification.)

Doing that using HTTP endpoints is possible, and it is the solution we have in 
place today, however, for each request to that endpoint, we need to incur the 
cost of establishing the connection, etc., thus increasing the latency of the 
system.

We do process data in batches to mitigate the latency, but it is not the same 
as having a bi-directional stream, as it would be possible using gRPC. 
Furthermore, we already use gRPC in other parts of our system.

We also want to be able to scale those endpoints up and down, as demand for the 
service fluctuates depending on the hour and day. Combining StateFun and 
Kubernetes would allow for that elasticity of the service, while keeping state 
of the execution, since inferences are not always just one endpoint, but a 
collection of them where the output of one becomes the input of the next, 
culminating with the predicted score(s).

We are evaluating StateFun because Flink is already part of the infrastructure. 
With that said, gRPC is also part of our requirements, thus motivation for the 
question.

I’d love to hear more about plans to implement support for gRPC and perhaps 
become an early adopter.

I hope this helps with understanding of the use case. Happy to talk further and 
answer more questions.

Best,

Dalmo



From: Igal Shilman <[email protected]<mailto:[email protected]>>
Date: Saturday, September 19, 2020 at 01:41
To: Dalmo Cirne <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Support for gRPC in Flink StateFun 2.x

Hi,

Your observation is correct, currently the only way to invoke a remote function 
is trough an HTTP POST request to a service that exposes a StateFun endpoint.

The endpoint must implement the client side of a the “RequestReply” protocol as 
defined by StateFun (basically an invocation contains the state and message, 
and a response contains a description of the side effects).

While gRPC can be easily added a as a replacement for the transport layer, the 
client side (the remote function) would still have to implement the 
RequestReply protocol.

To truly utilize gRPC we would want to introduce a new type of protocol, that 
can exploit the low latency bi-directional streams to and from the function.

While for the later it is a bit difficult to commit for a specific date the 
former can be easily implemented in the next StateFun release.

Would you be able to share with us a little bit more about your original 
motivation to ask this question :-)
This would help us as we gather more and more use cases.

For example: target language, environment, how gRPC services are being 
discovered.

Thanks,
Igal

On Thursday, September 17, 2020, Dalmo Cirne 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

In the latest Flink Forward, from April 2020, there were mentions that adding 
support to gRPC, in addition to HTTP, was in the works and would be implemented 
in the future.

Looking into the 
flink-statefun<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink-2Dstatefun&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=FeoftdI25c24WDfCzZuLKlzDGX4Ny1UkpP-nYieLwI4&m=D5GejhN0RqzCk7zz8mRBClCYQJLUs5sMKh4HGT09reQ&s=vXkqL1_aNT6gv4HluEmg_vqtb8gnDUxCBWw_YsQhRJw&e=>
 repository on GitHub, one can see that there is already some work done with 
gRPC, but parity with its HTTP counterpart is not there, yet.

Is there a roadmap or an estimate of when gRPC will be implemented in StateFun?

Thank you,

Dalmo









Reply via email to