Hey everyone, I've been looking at Flink to handle a fairly complex use case and was hoping for some feedback on if the approach I'm thinking about with Flink seems reasonable. When researching what people build on Flink, it seems like a lot of the focus tends to be on running fewer heavyweight/complex jobs whereas the approach I'm thinking about involves executing many potentially smaller and more lightweight jobs.
The core idea is that we have a lot (think 100s or 1000s) of incoming data streams (maybe via something like Apache Pulsar) and we have rules, of various complexities, that need to be executed against individual streams. If a rule matches, an event needs to be emitted to an output stream. The rules could be as simple as "In any event, if you see field X set to value 'foo', it's a match" or more complex like "If you see an event of type A followed by an event of type B followed by an event of type C in a certain time window, then it's a match." These rules are long-running (could be hours, days, weeks, or longer). It *seems* to me like Application Mode ( https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/overview/) with the Kubernetes Operator ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#application-deployments) which will create a new cluster per application seems like what I'd want. I'm envisioning each of these long-running rules (which potentially each read a different data stream) is its own job in its own application (maybe later some can be combined, but to start, they'll all be separate). Does that seem like the right approach to running a number of somewhat small jobs concurrently on Flink? Are there any "gotchas" to this I'm not thinking of? Any alternate approaches that are worth considering? Are there any users we know of who do something like this currently? Thanks for your time and insight! ~Brent