Re: [I] Investigate TPC-H q4 hanging when not enough memory is allocated [datafusion-comet]

via GitHub Tue, 18 Mar 2025 20:08:30 -0700


Kontinuation commented on issue #1523:
URL: 
https://github.com/apache/datafusion-comet/issues/1523#issuecomment-2735209828


   The query blocked because we don't have enough number of blocking threads 
configured for the tokio runtime.
   
   In merge phase, each spill file will be wrapped by a stream backed by a 
blocking thread (see 
[read_spill_as_stream](https://github.com/apache/datafusion/blob/46.0.1/datafusion/physical-plan/src/spill.rs#L44-L55)),
 so we'll spawn at least 183 blocking threads when there are 183 spill files to 
merge spilled data. The default number of blocking thread is 10, this make the 
query hang indefinitely.
   
   Tuning `spark.comet.blockingThreads` to a higher value could resolve this 
problem. We may consider raising the default value of 
`spark.comet.blockingThreads`, or improving sort-merge in datafusion to not 
spawning so many blocking threads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Investigate TPC-H q4 hanging when not enough memory is allocated [datafusion-comet]

Reply via email to