Hi Igal,
Thank you for getting back so quickly.
All of our applications are currently deployed onto the one Ververica cluster 
so I would be quite keen to get the DataSteam integration option evaluated (I 
am currently hitting an exception where the ObjectMapper in 
DefaultHttpRequestReplyClientSpec is not supporting Java 8 java.time.Duration). 
While I muddle through that, I would be obliged if you could direct me as to 
how I can deploy the equivalent of the master/worker container on Ververica.
Would it be as easy as creating a new Flink application in Java and porting the 
module.yml configurations with the relevant dependencies into that, the 
deploying that jar?
This is a nice middle ground option where the statefun state could be managed 
outside of the calling application whilst offering the separation you referred 
to on the same cluster.
I am thinking that that same statefun flink master/worker could be used to 
route all traffic in the future assuming the load was tolerable but that is 
further down the line.
Thanks again, I really appreciate your insights.
Barry

On 2021/09/02 13:09:13, Igal Shilman <i...@apache.org> wrote: 
> Hi Barry,
> I've forward your email to the user mailing list as it is more suitable
> here :-)
> 
> Your question definitely makes sense, and let me try to provide you with
> some pointers:
> 
> 1. The architecture that you've outlined has many advantages and is
> desirable if you can
> afford that. Some of them are
> - clean separation of concerns
> - better resource isolation.
> - different SLO and fault domains (failure/slowness in your Python
> function, doesn't trigger a failure/back-pressure in your ETL)
> - you can use event time watermarks for your ETL (statefun only works with
> processing time)
> 
> 2. If you would still prefer to merge the two then you can checkout the
> DataStream integration API [1]
> Although it has some rough edges in respect with working with remote
> functions in particular.
> 
> Good luck,
> Igal.
> 
> 
> [1]
> https://nightlies.apache.org/flink/flink-statefun-docs-release-3.1/docs/sdk/flink-datastream/
> 
> 
> On Thu, Sep 2, 2021 at 1:07 PM Barry Higgins <barry.p.higgi...@gmail.com>
> wrote:
> 
> > Hi,
> >
> > I have set up a remote stateful function in python which I’ve deployed
> > on an AWS EC2 box. I am interacting with this from a separate statefun
> > docker container running 2 flink-statefun images with roles master and
> > worker (on a separate EC2 instance). The ingress and egress points for
> > this function are Kafka.
> >
> > I then have a separate Java application using Flink, deployed on a
> > Ververica cluster. From this application I am communicating with the
> > statefun function by adding a sink/source pointing at the
> > ingress/egress above.
> >
> > I have a couple of questions on this setup.
> >
> > I am unsure if there is a better way to communicate with the function
> > from the Flink application
> > I am wondering if there is anyway that I can use the existing deployed
> > application to maintain the state of my remote function, meaning that
> > I can discard the statefun master/worker elements?
> > Failing that, do I just need to create a new Flink application,
> > translate the equivalent of the module.yml that is passed to the
> > existing master/worker to Java, add the dependencies and deploy that
> > jar?
> >
> > I hope that makes sense?
> > Kindest Regards,
> >
> > Barry
> >
> 

Reply via email to