Hi again,
Something to mention in addition, it also could be the case that StateFun
reaches a write timeout trying to write the accumulated batch to the remote
function (when the remote functions are overloaded.)
The requests are retried automatically, but still you way want to bump
these timeouts
Hi Jan,
I haven't stumbled upon this but I will try to reconstruct that scenario
with a stress test and report back.
Can you share a little bit about your environment. For example do you use
gunicorn, ngnix, aiohttp/or flask perhaps?
I'd suggest maybe checking for request size limits parameters
Thanks for reporting this, it looks indeed like a potential bug.
I filed this Jira for it: https://issues.apache.org/jira/browse/FLINK-22729
Could you share (here ot in Jira) what the stack on the Python Worker side
is (for example which HTTP server)? Do you know if the message truncation
happens