Hi again,
Something to mention in addition, it also could be the case that StateFun
reaches a write timeout trying to write the accumulated batch to the remote
function (when the remote functions are overloaded.)
The requests are retried automatically, but still you way want to bump
these timeouts
Hi Jan,
I haven't stumbled upon this but I will try to reconstruct that scenario
with a stress test and report back.
Can you share a little bit about your environment. For example do you use
gunicorn, ngnix, aiohttp/or flask perhaps?
I'd suggest maybe checking for request size limits parameters
Thanks for reporting this, it looks indeed like a potential bug.
I filed this Jira for it: https://issues.apache.org/jira/browse/FLINK-22729
Could you share (here ot in Jira) what the stack on the Python Worker side
is (for example which HTTP server)? Do you know if the message truncation
happens
Hi,
recently we started seeing the following faulty behaviour in the Flink
Stateful Functions HTTP communication towards external Python workers.
This is only occuring when the system is under heavy load.
The Java Application will send HTTP Messages to an external Python
Function but the ext