Re: Remote Stateful Function Scalability

Tzu-Li (Gordon) Tai Sat, 17 Oct 2020 19:29:47 -0700

Hi Elias,

On Sun, Oct 18, 2020 at 6:16 AM Elias Levy <fearsome.lucid...@gmail.com>
wrote:


> After reading the Stateful Functions documentation, I am left wondering
> how remote stateful functions scale.
>
> The documentation mentions that the use of remote functions allows the
> state and compute tiers to scale independently. But the documentation seems
> to imply that only a single instance of a function type can execute at a
> time per worker ("*When an application starts, each parallel worker of
> the framework will create one physical object per function type. This
> object will be used to execute all logical instances of that type that are
> run by that particular worker.*") That would seem to tie and limit the
> parallelism of the compute layer to that of the storage layer even when
> using remote functions.
>

Your observation is correct only for embedded functions, not for remote
functions.
For remote functions, in the StateFun workers each physical object per
function type acts as an asynchronous invocation dispatcher to the type's
remote function service.

Just to quickly brief what the dispatcher does:
The dispatcher *only ensures sequential invocation per logical address*
(function type + logical instance ID / key).
Invocations for different logical addresses (different types / different
keys) can happen concurrently.

If an invocation request for a logical address is in-flight, and other
messages targeted for that address arrive, they are buffered in a backlog
(state) until the pending request completes.
Upon completion, the backlog is flushed and all buffered messages are sent
to the remote function as a single batch invocation request.
Backpressure is applied once the backlog size reaches a threshold.

All in all, in vanilla Flink-land terms, this works similarly to Flink's
AsyncIO without the stream order preserved.

So, to conclude by answering your specific questions:


>
> Can a worker execute multiple concurrent remote stateful functions of
> different types?
>

Yes.


>
> Can a worker execute multiple concurrent remote stateful functions of the
> same type with different keys?
>

Yes.


>
> If a worker can execute multiple concurrent remote stateful functions of
> the same type with different keys, does it ensure their output is ordered
> like its inputs?
>

No, currently StateFun handles outgoing messages (i.e. messages going to
other functions / egresses) only based on the order that the concurrent
invocation requests complete.
However, I believe that it should be possible to support an ordered mode
here at the cost of extra latency (early completes need to be buffered,
checkpoint overhead etc.).

Hope this helps clarify some things!

Cheers,
Gordon

Re: Remote Stateful Function Scalability

Reply via email to