[ https://issues.apache.org/jira/browse/SPARK-51667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim reassigned SPARK-51667: ------------------------------------ Assignee: Jungtaek Lim > [TWS + Python] Disable Nagle's algorithm between Python worker and State > Server > ------------------------------------------------------------------------------- > > Key: SPARK-51667 > URL: https://issues.apache.org/jira/browse/SPARK-51667 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 4.0.0, 4.1.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > Labels: pull-request-available > > During testing TWS + Python, we figured out the case where the socket > communication for state interaction had delayed for more than 40ms, for > certain type of state, e.g. ListState.put(), ListState.get(), > ListState.appendList(), etcetc. > The root cause is figured out as the combination of Nagle's algorithm and > delayed ACK. The sequence is following: > # Python worker sends the proto message to JVM, and flushes the socket. > # Additionally, Python worker sends the follow-up data to JVM, and flushes > the socket. > # JVM reads the proto message, and realizes there is follow-up data. > # JVM reads the follow-up data. > # JVM processes the request, and sends the response back to Python worker. > Due to delayed ACK, even after 3, ACK is not sent back from JVM to Python > worker. It is waiting for some data or multiple ACKs to be sent, but JVM is > not going to send the data during that phase. > Due to Nagle's algorithm, the message from 2 is not sent to JVM since there > is no ACK for the message from 1. > This deadlock situation is resolved after the timeout of delayed ACK, which > is 40ms (minimum duration) in Linux. After the timeout, ACK is sent back from > JVM to Python worker, hence Nagle's algorithm allows the message from 2 to be > finally sent to JVM. > See below articles for more general explanation: > * [https://engineering.avast.io/40-millisecond-bug/] > ** Start reading from Nagle's algorithm section > * [https://brooker.co.za/blog/2024/05/09/nagle.html] > Nagle's algorithm helps to reduce a lot of small packets, which the above > article states it could help the router from overloaded. We connect to > "localhost" here. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org