Dear Igal, all, Thanks a lot. This is very helpful. I understand the architecture a bit more now. We can just scale the stateful functions and put a load balancer in front and Flink will contact them. The only part of the scaling I don't understand yet is how to scale the 'Flink side'. So If I understand correctly the Kafka ingress/egress parts runs on the Flink cluster and contacts the remote workers through HTTP. How can I scale this Kafka part then? For a normal Flink job I would just change the parallelism, but I couldn't really find that option yet. Is there some value I need to set in the module.yaml.
Once again, thanks for the help so far. It has been useful. Regards, Wouter Op wo 13 mei 2020 om 00:03 schreef Igal Shilman <i...@ververica.com>: > Hi Wouter, > > Triggering a stateful function from a frontend indeed requires an ingress > between them, so the way you've approached this is also the way we were > thinking of. > As Gordon mentioned a potential improvement might be an HTTP ingress, that > would allow triggering stateful functions directly from the front end > servers. > But this kind of ingress is not implemented yet. > > Regarding scaling: Your understanding is correct, you can scale both the > Flink cluster and the remote "python-stateful-function" cluster > independently. > Scaling the Flink cluster, tho, requires taking a savepoint, bumping the > job parallelism, and starting the cluster with more workers from the > savepoint taken previously. > > Scaling "python-stateful-function" workers can be done transparently to > the Flink cluster, but the exact details are deployment specific. > - For example the python workers are a k8s service. > - Or the python workers are deployed behind a load balancer > - Or you add new entries to the DNS record of your python worker. > > I didn't understand "ensuring that it ends op in the correct Flink job" > can you please clarify? > Flink would be the one contacting the remote workers and not the other way > around. So as long as the new instances > are visible to Flink they would be reached with the same shared state. > > I'd recommend watching [1] and the demo at the end, and [2] for a demo > using stateful functions on AWS lambda. > > [1] https://youtu.be/NF0hXZfUyqE > [2] https://www.youtube.com/watch?v=tuSylBadNSo > > It seems like you are on the correct path! > Good luck! > Igal. > > > On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager <zorgdrag...@gmail.com> > wrote: > >> Hi Igal, all, >> >> In the meantime we found a way to serve Flink stateful functions in a >> frontend. We decided to add another (set of) Flask application(s) which >> link to Kafka topics. These Kafka topics then serve as ingress and egress >> for the statefun cluster. However, we're wondering how we can scale this >> cluster. On the documentation page some nice figures are provided for >> different setups but no implementation details are given. In our case we >> are using a remote cluster so we have a Docker instance containing the >> `python-stateful-function` and of course the Flink cluster containing a >> `master` and `worker`. If I understood correctly, in a remote setting, we >> can scale both the Flink cluster and the `python-stateful-function`. >> Scaling the Flink cluster is trivial because I can add just more >> workers/task-managers (providing more taskslots) just by scaling the worker >> instance. However, how can I scale the stateful function also ensuring that >> it ends op in the correct Flink job (because we need shared state there). I >> tried scaling the Docker instance as well but that didn't seem to work. >> >> Hope you can give me some leads there. >> Thanks in advance! >> >> Kind regards, >> Wouter >> >> Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager < >> zorgdrag...@gmail.com>: >> >>> Hi Igal, >>> >>> Thanks for your quick reply. Getting back to point 2, I was wondering if >>> you could trigger indeed a stateful function directly from Flask and also >>> get the reply there instead of using Kafka in between. We want to >>> experiment running stateful functions behind a front-end (which should be >>> able to trigger a function), but we're a bit afraid that using Kafka >>> doesn't scale well if on the frontend side a user has to consume all Kafka >>> messages to find the correct reply/output for a certain request/input. Any >>> thoughts? >>> >>> Thanks in advance, >>> Wouter >>> >>> Op do 7 mei 2020 om 10:51 schreef Igal Shilman <i...@ververica.com>: >>> >>>> Hi Wouter! >>>> >>>> Glad to read that you are using Flink for quite some time, and also >>>> exploring with StateFun! >>>> >>>> 1) yes it is correct and you can follow the Dockerhub contribution PR >>>> at [1] >>>> >>>> 2) I’m not sure I understand what do you mean by trigger from the >>>> browser. >>>> If you mean, for testing / illustration purposes triggering the >>>> function independently of StateFun, you would need to write some JavaScript >>>> and preform the POST (assuming CORS are enabled) >>>> Let me know if you’d like getting further information of how to do it. >>>> Broadly speaking, GET is traditionally used to get data from a resource >>>> and POST to send data (the data is the invocation batch in our case). >>>> >>>> One easier walk around for you would be to expose another endpoint in >>>> your Flask application, and call your stateful function directly from there >>>> (possibly populating the function argument with values taken from the query >>>> params) >>>> >>>> 3) I would expect a performance loss when going from the embedded SDK >>>> to the remote one, simply because the remote function is at a different >>>> process, and a round trip is required. There are different ways of >>>> deployment even for remote functions. >>>> For example they can be co-located with the Task managers and >>>> communicate via the loop back device /Unix domain socket, or they can be >>>> deployed behind a load balancer with an auto-scaler, and thus reacting to >>>> higher request rate/latency increases by spinning new instances (something >>>> that is not yet supported with the embedded API) >>>> >>>> Good luck, >>>> Igal. >>>> >>>> >>>> >>>> >>>> >>>> [1] https://github.com/docker-library/official-images/pull/7749 >>>> >>>> >>>> On Wednesday, May 6, 2020, Wouter Zorgdrager <zorgdrag...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've been using Flink for quite some time now and for a university >>>>> project I'm planning to experiment with statefun. During the walkthrough >>>>> I've run into some issues, I hope you can help me with. >>>>> >>>>> 1) Is it correct that the Docker image of statefun is not yet >>>>> published? I couldn't find it anywhere, but was able to run it by building >>>>> the image myself. >>>>> 2) In the example project using the Python SDK, it uses Flask to >>>>> expose a function using POST. Is there also a way to serve GET request so >>>>> that you can trigger a stateful function by for instance using your >>>>> browser? >>>>> 3) Do you expect a lot of performance loss when using the Python SDK >>>>> over Java? >>>>> >>>>> Thanks in advance! >>>>> >>>>> Regards, >>>>> Wouter >>>>> >>>>