Callum Dempsey Leach created SPARK-52571:
--------------------------------------------

             Summary: ExecuteGrpcResponseSender Deadline Exceeded Occurs when 
Size Limits are hit
                 Key: SPARK-52571
                 URL: https://issues.apache.org/jira/browse/SPARK-52571
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 4.0.0
         Environment: h2. Environment

*Spark Version:* 4.0.0
*Deployment:* Docker container using {{apache/spark:4.0.0}}
*Client:* Scala application using Spark Connect
*Data Source:* Delta tables on S3 (s3a://)
*Dataset Size:* 20+ million rows

*Docker Configuration:*

{{services:
  spark:
    image: apache/spark:4.0.0
    mem_limit: 12g
    environment:
      SPARK_MODE: master
    ports:
      - "15002:15002"    # Spark-Connect gRPC}}

*Spark Connect Server Configuration:*

{{/opt/spark/sbin/start-connect-server.sh \
  --conf spark.driver.memory=10g \
  --conf spark.driver.maxResultSize=8g \
  --conf spark.connect.execute.reattachable.senderMaxStreamDuration=1200s \
  --conf spark.connect.execute.reattachable.senderMaxStreamSize=2g \
  --conf spark.connect.grpc.maxInboundMessageSize=268435456 \
  --conf spark.connect.grpc.deadline=1200s \
  --conf spark.network.timeout=1200s}}
h2.  
            Reporter: Callum Dempsey Leach
             Fix For: 4.0.1


h4. Issue

When streaming large result sets (20M+ rows) from Delta tables using Spark 
Connect, the {{ExecuteGrpcResponseSender}} encounters {{DEADLINE_EXCEEDED}} 
errors due to  default timeout configurations and bad error handling.

1. The default {{senderMaxStreamDuration}} of 2 minutes is inadequate for 
long-running queries that need to stream substantial amounts of data back to 
the client. Documentation for this is missing from the Configuration docs, and 
should be added.

2. Even after configuring this I was still encountering problems based on 
source code 
[{{ExecuteGrpcResponseSender.scala:}}|https://github.com/apache/spark/blob/master/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala]
 , timeout logic checks check both time and size limits but only reports 
"Deadline reached":

{{}}
{code:java}
// Line ~240: The condition checks BOTH time AND size def deadlineLimitReached 
= sentResponsesSize > maximumResponseSize || deadlineTimeNs < System.nanoTime() 

So users see "Deadline reached" even when the issue is actually:
{code}
 

As a user, I was focused on time-based configurations when the real issue was 
also in size-based limits in 
{{{}CONNECT_EXECUTE_REATTACHABLE_SENDER_MAX_STREAM_SIZE{}}}. I resolved it by 
configuring some gnarly values, giving me a cool 500K rows / s throughput from 
my Delta table in S3:

{{--conf spark.connect.execute.reattachable.senderMaxStreamDuration=1200s
--conf spark.connect.execute.reattachable.senderMaxStreamSize=2g
--conf spark.connect.grpc.maxInboundMessageSize=268435456
--conf spark.connect.grpc.deadline=1200s
--conf spark.network.timeout=1200s}}
h4. Proposed Improvements
 # {*}Documentation Enhancement{*}:

 ** Add clear documentation about timeout configurations for long-running 
streaming queries
 ** Provide guidance on sizing timeouts based on expected data volumes
 # {*}Default Value Review{*}:

 ** Consider increasing default {{senderMaxStreamDuration}} from 2m to a more 
practical value (e.g., 10m)
 ** Evaluate default {{senderMaxStreamSize}} of 1GB for large-scale streaming 
scenarios
 # {*}Better Error Messages{*}:

 ** Fix misleading "Deadline reached" messages that can be triggered by size 
limits. 
 ** Distinguish between time-based and size-based stream termination in log 
messages
 ** Improve error messages to clearly indicate when timeouts are 
configuration-related vs size-related
 ** Suggest specific configuration parameters in timeout/size error messages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to