Hi Seth, Thanks for your comment. I've seen that repository in the past and it was really helpful to "validate" that this was the way to go. I think my question is not being addressed there though: how could one add dynamic behavior to your TransactionManager? In this case, state that is available to all TransactionManager instances when they receive a message of type Transaction for the first time.
Seth Wiesman <sjwies...@gmail.com> escreveu no dia terça, 23/02/2021 à(s) 16:02: > Hey Miguel, > > What you are describing is exactly what is implemented in this repo. The > TransactionManager function acts as an orchestrator to work with the other > functions. The repo is structured as an exercise but the full solution > exists on the branch `advanced-solution`. > > https://github.com/ververica/flink-statefun-workshop > > On Tue, Feb 23, 2021 at 8:34 AM Miguel Araújo <upwarr...@gmail.com> wrote: > >> Another possibility I am considering is handling this in Flink using a >> broadcast and adding all the information needed to the event itself. I'm a >> little concerned about the amount of data that will be serialized and sent >> on every request though, as I'll need to include information about all >> available remote functions, for instance. >> >> Miguel Araújo <upwarr...@gmail.com> escreveu no dia terça, 23/02/2021 >> à(s) 09:14: >> >>> Hi Gordon, Igal, >>> >>> Thanks for your replies. >>> PubSub would be a good addition, I have a few scenarios where that would >>> be useful. >>> >>> However, after reading your answers I realized that your proposed >>> solutions (which address the most obvious interpretation of my question) do >>> not necessarily solve my problem. I should have just stated what it was, >>> instead of trying to propose a solution by discussing broadcast... >>> >>> I'm trying to implement an "orchestrator" function which, given an >>> event, will trigger multiple remote function calls, aggregate their results >>> and eventually call yet more functions (based on a provided dependency >>> graph). Hence, this orchestrator function has state per event_id and each >>> function instance is short-lived (a couple seconds at most, ideally >>> sub-second). The question then is not about how to modify a long-running >>> function instance (which PubSub would enable), but rather how to have the >>> dependency graph available to new functions. >>> >>> Given this, Igal's answer seems promising because we have the >>> FunctionProvider instantiating a local variable and passing it down on >>> every instantiation. I'm assuming there is one FunctionProvider per >>> TaskManager. Is there an easy way to have the FunctionProvider receiving >>> data coming from a Flink DataStream, or receiving StateFun messages? >>> Otherwise, I could have it subscribe to a Kafka topic directly. >>> >>> I really appreciate your help. >>> >>> Miguel >>> >>> Igal Shilman <i...@ververica.com> escreveu no dia segunda, 22/02/2021 >>> à(s) 12:09: >>> >>>> Hi Miguel, >>>> >>>> I think that there are a couple of ways to achieve this, and it really >>>> depends on your specific use case, and the trade-offs >>>> that you are willing to accept. >>>> >>>> For example, one way to approach this: >>>> - Suppose you have an external service somewhere that returns a >>>> representation of the logic to be interpreted by >>>> your function at runtime (I think that is the scenario you are >>>> describing) >>>> - Then, you can write a background task (a thread) that periodically >>>> queries that service, and keeps in memory the latest version. >>>> - You can initialize this background task in your FunctionProvider >>>> implementation, or even in your StatefulModule if you wish. >>>> - Then, make sure that your dynamic stateful function has an access to >>>> the latest value fetched by your client (for example via a shared reference >>>> like a j.u.c.AtomicReference) >>>> - Then on receive, you can simply get that reference and re-apply your >>>> rules. >>>> >>>> Take a look at [1] for example (it is not exactly the same, but I >>>> believe that it is close enough) >>>> >>>> [1] >>>> https://github.com/apache/flink-statefun/blob/master/statefun-examples/statefun-async-example/src/main/java/org/apache/flink/statefun/examples/async/Module.java >>>> >>>> Good luck, >>>> Igal. >>>> >>>> >>>> On Mon, Feb 22, 2021 at 4:32 AM Tzu-Li (Gordon) Tai < >>>> tzuli...@apache.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> FWIW, there is this JIRA that is tracking a pubsub / broadcast >>>>> messaging primitive in StateFun: >>>>> https://issues.apache.org/jira/browse/FLINK-16319 >>>>> >>>>> This is probably what you are looking for. And I do agree, in the case >>>>> that the control stream (which updates the application logic) is high >>>>> volume, redeploying functions may not work well. >>>>> >>>>> I don't think there really is a "recommended" way of doing the >>>>> "broadcast control stream, join with main stream" pattern with StateFun at >>>>> the moment, at least without FLINK-16319. >>>>> On the other hand, it could be possible to use stateful functions to >>>>> implement a pub-sub model in user space for the time being. I've actually >>>>> left some ideas for implementing that in the comments of FLINK-16319. >>>>> >>>>> Cheers, >>>>> Gordon >>>>> >>>>> >>>>> On Mon, Feb 22, 2021 at 6:38 AM Miguel Araújo <upwarr...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> What is the recommended way of achieving the equivalent of a >>>>>> broadcast in Flink when using Stateful Functions? >>>>>> >>>>>> For instance, assume we are implementing something similar to Flink's >>>>>> demo fraud detection >>>>>> <https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html> >>>>>> but >>>>>> in Stateful Functions - how can one dynamically update the application's >>>>>> logic then? >>>>>> There was a similar question in this mailing list in the past where >>>>>> it was recommended moving the dynamic logic to a remote function >>>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Stateful-Functions-ML-model-prediction-tp38286p38302.html> >>>>>> so >>>>>> that one could achieve that by deploying a new container. I think that's >>>>>> not very realistic as updates might happen with a frequency that's not >>>>>> compatible with that approach (e.g., sticking to the fraud detection >>>>>> example, updating fraud detection rules every hour is not unusual), nor >>>>>> should one be deploying a new container when data (not code) changes. >>>>>> >>>>>> Is there a way of, for example, modifying FunctionProviders >>>>>> <https://ci.apache.org/projects/flink/flink-statefun-docs-master/sdk/java.html#function-providers-and-dependency-injection> >>>>>> on the fly? >>>>>> >>>>>> Thanks, >>>>>> Miguel >>>>>> >>>>>