Hi all,

About our use case -

We have many clients connected via websockets through api gateway on
AWS, these clients submit events of various types periodically, each
event contains a session_id (generated by the client), the session
ends when there's no activity for a specified duration of time. We
have a sequence model (RNN) written in PyTorch that needs to send
predictions back to the clients for each event that is being sent. The
state should contain the raw events sent per session and after a local
transformation it can serve as an input to the model.

Flink seems like a natural fit for our app (no session store is a
winner) but we are seriously struggling to understand how to implement
it and with which API, currently it boils down to PyFlink or Beam.

What do you think?

As I mentioned, our model runs with Python, we are currently trying to
use the PyFlink Datastream API, but now we're not sure whether we can
have a Kinesis data stream source in Python. Is it supported?

As for the execution environment, we'd like to use a managed service
such as Kinesis Data Analytics, or perhaps we should use EMR with
flink installed... What is recommended?

Note that we don't care so much about durability and it's ok for our
use-case to lose sessions in case of failures, so with this assumption
we'd like to build the fastest performing setup.

Appreciate any additional pointers for implementing our app.

Thanks,
Des.

Reply via email to