Hi All, It'd be great to consider stream processing as a platform for our upcoming projects. Flink seems to be the closeted match.
However we have numerous stream processing workloads and would want to be able to scale up to 1000's different streams; each quite similar in structure/sequence but with the functional logic being very different in each. For example, there is always a "validate" stage - but what that means is dependant on the client/data/context etc and would typically map to a few line of script to perform. In essence, our sequences can often be deconstructed down to 8-12 python snippets and the serverless/functional paradigm seems to fit well. Whilst we can deploy our functions readily to a faas/k8s or something (which seems to fit the bill with remote functions) I don't yet see how to quickly draw these together in a dynamic stream. My initial thoughts would be to create a very general purpose stream job that then works through the context of mapping functions to flink tasks based on the client dataset. E.g. some pseudo code: ingress() extract() getDynamicStreamFunctionDefs() getFunction1() runFunction1() abortOnError() getFunction2() runFunction2() abortOnError() ... getFunction10() runFunction10() sinkData() Most functions are not however simple lexical operations, or extractors/mappers - but on the whole require a few database/API calls to retrieve things like previous data, configurations etc. They are not necessarily long running but certainly Async is a consideration. I think every stage will be UDFs (and then Meta-UDFs at that) As a result I'm not sure if we can get this to fit without a brittle set of workarounds, and ruin any benefit of running through flink etc... but it would great to hear opinions of others who might have tackled this kind of dynamic tasking. I'm happy to explain this better if it isn't clear. With best regards Rob Rob Shepherd BEng PhD