Jungtaek Lim created SPARK-51667:
------------------------------------

             Summary: [TWS + Python] Disable Nagle's algorithm between Python 
worker and State Server
                 Key: SPARK-51667
                 URL: https://issues.apache.org/jira/browse/SPARK-51667
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.0.0, 4.1.0
            Reporter: Jungtaek Lim


During testing TWS + Python, we figured out the case where the socket 
communication for state interaction had delayed for more than 40ms, for certain 
type of state, e.g. ListState.put(), ListState.get(), ListState.appendList(), 
etcetc.

The root cause is figured out as the combination of Nagle's algorithm and 
delayed ACK. The sequence is following:
 # Python worker sends the proto message to JVM, and flushes the socket.
 # Additionally, Python worker sends the follow-up data to JVM, and flushes the 
socket.
 # JVM reads the proto message, and realizes there is follow-up data.
 # JVM reads the follow-up data.
 # JVM processes the request, and sends the response back to Python worker.

Due to delayed ACK, even after 3, ACK is not sent back from JVM to Python 
worker. It is waiting for some data or multiple ACKs to be sent, but JVM is not 
going to send the data during that phase.

Due to Nagle's algorithm, the message from 2 is not sent to JVM since there is 
no ACK for the message from 1.

This deadlock situation is resolved after the timeout of delayed ACK, which is 
40ms (minimum duration) in Linux. After the timeout, ACK is sent back from JVM 
to Python worker, hence Nagle's algorithm allows the message from 2 to be 
finally sent to JVM.

See below articles for more general explanation:
 * [https://engineering.avast.io/40-millisecond-bug/]
 ** Start reading from Nagle's algorithm section
 * [https://brooker.co.za/blog/2024/05/09/nagle.html]

Nagle's algorithm helps to reduce a lot of small packets, which the above 
article states it could help the router from overloaded. We connect to 
"localhost" here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to