Hi Tim!

Indeed the StateFun SDK / StateFun runtime, has an internal concept of
batching, that kicks in the presence of a slow
/congested remote function. Keep in mind that under normal circumstances
batching does not happen (effectively a batch of size 1 will be sent). [1]
This batch is not currently exposed via the SDKs (both Java and Python) as
it is an implementation detail (see [2]).

The way I understand your message (please correct me if I'm wrong): is that
evaluation of the ML model is costly, and it would benefit from some sort
of batching (like pandas do i assume ?)
instead of being applied for every event individually.
If this is the case, perhaps exposing this batch can be a useful feature to
add.

For example:

@functions.bind_tim(..)
def ml(context, messages: typing.List[Message]):
  ...



Let me know what you think,
Igal.



[1]
https://github.com/apache/flink-statefun/blob/master/statefun-sdk-protos/src/main/protobuf/sdk/request-reply.proto#L80
[2]
https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/request_reply_v3.py#L219

On Fri, Apr 16, 2021 at 11:48 PM Timothy Bess <tdbga...@gmail.com> wrote:

> Hi everyone,
>
> Is there a good way to access the batch of leads that Statefun sends to
> the Python SDK rather than processing events one by one? We're trying to
> run our data scientist's machine learning model through the SDK, but the
> code is very slow when we do single events and we don't get many of the
> benefits of Pandas/etc.
>
> Thanks,
>
> Tim
>

Reply via email to