Re: Remote Stateful Function Scalability

Seth Wiesman Tue, 20 Oct 2020 10:41:12 -0700

As a note, I wrote that concepts section before remote functions were
implemented. I've made a note to myself to go through and update it.


Seth

On Sat, Oct 17, 2020 at 9:29 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi Elias,
>
> On Sun, Oct 18, 2020 at 6:16 AM Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
>
>> After reading the Stateful Functions documentation, I am left wondering
>> how remote stateful functions scale.
>>
>> The documentation mentions that the use of remote functions allows the
>> state and compute tiers to scale independently. But the documentation seems
>> to imply that only a single instance of a function type can execute at a
>> time per worker ("*When an application starts, each parallel worker of
>> the framework will create one physical object per function type. This
>> object will be used to execute all logical instances of that type that are
>> run by that particular worker.*") That would seem to tie and limit the
>> parallelism of the compute layer to that of the storage layer even when
>> using remote functions.
>>
>
> Your observation is correct only for embedded functions, not for remote
> functions.
> For remote functions, in the StateFun workers each physical object per
> function type acts as an asynchronous invocation dispatcher to the type's
> remote function service.
>
> Just to quickly brief what the dispatcher does:
> The dispatcher *only ensures sequential invocation per logical address*
> (function type + logical instance ID / key).
> Invocations for different logical addresses (different types / different
> keys) can happen concurrently.
>
> If an invocation request for a logical address is in-flight, and other
> messages targeted for that address arrive, they are buffered in a backlog
> (state) until the pending request completes.
> Upon completion, the backlog is flushed and all buffered messages are sent
> to the remote function as a single batch invocation request.
> Backpressure is applied once the backlog size reaches a threshold.
>
> All in all, in vanilla Flink-land terms, this works similarly to Flink's
> AsyncIO without the stream order preserved.
>
> So, to conclude by answering your specific questions:
>
>
>>
>> Can a worker execute multiple concurrent remote stateful functions of
>> different types?
>>
>
> Yes.
>
>
>>
>> Can a worker execute multiple concurrent remote stateful functions of the
>> same type with different keys?
>>
>
> Yes.
>
>
>>
>> If a worker can execute multiple concurrent remote stateful functions of
>> the same type with different keys, does it ensure their output is ordered
>> like its inputs?
>>
>
> No, currently StateFun handles outgoing messages (i.e. messages going to
> other functions / egresses) only based on the order that the concurrent
> invocation requests complete.
> However, I believe that it should be possible to support an ordered mode
> here at the cost of extra latency (early completes need to be buffered,
> checkpoint overhead etc.).
>
> Hope this helps clarify some things!
>
> Cheers,
> Gordon
>

Re: Remote Stateful Function Scalability

Reply via email to