Hi Igal, Thank you for getting back so quickly. All of our applications are currently deployed onto the one Ververica cluster so I would be quite keen to get the DataSteam integration option evaluated (I am currently hitting an exception where the ObjectMapper in DefaultHttpRequestReplyClientSpec is not supporting Java 8 java.time.Duration). While I muddle through that, I would be obliged if you could direct me as to how I can deploy the equivalent of the master/worker container on Ververica. Would it be as easy as creating a new Flink application in Java and porting the module.yml configurations with the relevant dependencies into that, the deploying that jar? This is a nice middle ground option where the statefun state could be managed outside of the calling application whilst offering the separation you referred to on the same cluster. I am thinking that that same statefun flink master/worker could be used to route all traffic in the future assuming the load was tolerable but that is further down the line. Thanks again, I really appreciate your insights. Barry
On 2021/09/02 13:09:13, Igal Shilman <i...@apache.org> wrote: > Hi Barry, > I've forward your email to the user mailing list as it is more suitable > here :-) > > Your question definitely makes sense, and let me try to provide you with > some pointers: > > 1. The architecture that you've outlined has many advantages and is > desirable if you can > afford that. Some of them are > - clean separation of concerns > - better resource isolation. > - different SLO and fault domains (failure/slowness in your Python > function, doesn't trigger a failure/back-pressure in your ETL) > - you can use event time watermarks for your ETL (statefun only works with > processing time) > > 2. If you would still prefer to merge the two then you can checkout the > DataStream integration API [1] > Although it has some rough edges in respect with working with remote > functions in particular. > > Good luck, > Igal. > > > [1] > https://nightlies.apache.org/flink/flink-statefun-docs-release-3.1/docs/sdk/flink-datastream/ > > > On Thu, Sep 2, 2021 at 1:07 PM Barry Higgins <barry.p.higgi...@gmail.com> > wrote: > > > Hi, > > > > I have set up a remote stateful function in python which I’ve deployed > > on an AWS EC2 box. I am interacting with this from a separate statefun > > docker container running 2 flink-statefun images with roles master and > > worker (on a separate EC2 instance). The ingress and egress points for > > this function are Kafka. > > > > I then have a separate Java application using Flink, deployed on a > > Ververica cluster. From this application I am communicating with the > > statefun function by adding a sink/source pointing at the > > ingress/egress above. > > > > I have a couple of questions on this setup. > > > > I am unsure if there is a better way to communicate with the function > > from the Flink application > > I am wondering if there is anyway that I can use the existing deployed > > application to maintain the state of my remote function, meaning that > > I can discard the statefun master/worker elements? > > Failing that, do I just need to create a new Flink application, > > translate the equivalent of the module.yml that is passed to the > > existing master/worker to Java, add the dependencies and deploy that > > jar? > > > > I hope that makes sense? > > Kindest Regards, > > > > Barry > > >