Hi All,

It'd be great to consider stream processing as a platform for our upcoming
projects. Flink seems to be the closeted match.

However we have numerous stream processing workloads and would want to be
able to scale up to 1000's different streams;  each quite similar in
structure/sequence but with the functional logic being very different in
each.

For example, there is always a "validate" stage - but what that means is
dependant on the client/data/context etc and would typically map to a few
line of script to perform.

In essence, our sequences can often be deconstructed down to 8-12 python
snippets and the serverless/functional paradigm seems to fit well.

Whilst we can deploy our functions readily to a faas/k8s or something
(which seems to fit the bill with remote functions) I don't yet see how to
quickly draw these together in a dynamic stream.

My initial thoughts would be to create a very general purpose stream job
that then works through the context of mapping functions to flink tasks
based on the client dataset.

E.g. some pseudo code:

ingress()
extract()
getDynamicStreamFunctionDefs()
getFunction1()
runFunction1()
abortOnError()
getFunction2()
runFunction2()
abortOnError()
...
getFunction10()
runFunction10()
sinkData()

Most functions are not however simple lexical operations, or
extractors/mappers - but on the whole require a few database/API calls to
retrieve things like previous data, configurations etc.

They are not necessarily long running but certainly Async is a
consideration.

I think every stage will be UDFs (and then Meta-UDFs at that)

As a result I'm not sure if we can get this to fit without a brittle set of
workarounds, and ruin any benefit of running through flink etc...
but it would great to hear opinions of others who might have tackled this
kind of dynamic tasking.


I'm happy to explain this better if it isn't clear.

With best regards

Rob




Rob Shepherd BEng PhD

Reply via email to