Hi Tim! Indeed the StateFun SDK / StateFun runtime, has an internal concept of batching, that kicks in the presence of a slow /congested remote function. Keep in mind that under normal circumstances batching does not happen (effectively a batch of size 1 will be sent). [1] This batch is not currently exposed via the SDKs (both Java and Python) as it is an implementation detail (see [2]).
The way I understand your message (please correct me if I'm wrong): is that evaluation of the ML model is costly, and it would benefit from some sort of batching (like pandas do i assume ?) instead of being applied for every event individually. If this is the case, perhaps exposing this batch can be a useful feature to add. For example: @functions.bind_tim(..) def ml(context, messages: typing.List[Message]): ... Let me know what you think, Igal. [1] https://github.com/apache/flink-statefun/blob/master/statefun-sdk-protos/src/main/protobuf/sdk/request-reply.proto#L80 [2] https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/request_reply_v3.py#L219 On Fri, Apr 16, 2021 at 11:48 PM Timothy Bess <tdbga...@gmail.com> wrote: > Hi everyone, > > Is there a good way to access the batch of leads that Statefun sends to > the Python SDK rather than processing events one by one? We're trying to > run our data scientist's machine learning model through the SDK, but the > code is very slow when we do single events and we don't get many of the > benefits of Pandas/etc. > > Thanks, > > Tim >