paleolimbot opened a new pull request, #558:
URL: https://github.com/apache/sedona-db/pull/558

   I had intended to post a reprex to GeoPandas regarding threading but was 
caught by this issue, where the way we collected things into Python caused a 
lot of attempts to acquire the GIL which interefered with UDF execution.
   
   Briefly, before this PR, the Python bindings always collected via a special 
`RecordBatchReader` that called `block_on()`, waiting for the next batch in the 
output `SendableRecordBatchStream`. To ensure cancellation requests worked, we 
aquired the GIL every 1 second to check for signals.
   
   This constant `block_on()` + GIL acquisition caused a deadlock when Python 
UDFs were also trying to acquire the GIL.
   
   The workaround here is not a full solution but covers the most common case, 
where a user wants to collect the entire result (e.g., `.to_pandas()`. This is 
simpler to orchestrate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to