zhangfengcdt commented on code in PR #362:
URL: https://github.com/apache/sedona-db/pull/362#discussion_r2578237245


##########
rust/sedona-spatial-join/src/build_index.rs:
##########
@@ -33,7 +33,72 @@ use crate::{
     spatial_predicate::SpatialPredicate,
 };
 
-pub(crate) async fn build_index(
+/// Synchronous version of build_index that doesn't spawn tasks
+/// Used in execution contexts without async runtime support (e.g., 
Spark/Comet JNI)
+pub async fn build_index_sync(

Review Comment:
   @Kontinuation Yes, I think the function name might be a bit confusing, I can 
rename it to `build_index_sequential` to better convey that it avoids task 
spawning. 
   
   So basically, the reason we need this (instead of using build_index 
directly) is because the limitation with JoinSet::spawn() in JNI contexts: when 
running in a Spark/Comet JNI context, JoinSet::spawn() fails because:
     1. The tokio runtime in JNI contexts may be single-threaded or have 
limited threading capabilities
     2. JoinSet::spawn() requires a multi-threaded runtime to spawn new tasks
   
   Even with single-partition DataFusion plans (which is typical for Comet), 
collect_all still calls JoinSet::spawn():
   
   ```
     // In collect_all:
     let mut join_set = JoinSet::new();
     for ... {
         join_set.spawn(async move { ... });  // <- This fails in JNI contexts
     }
     join_set.join_all().await;  // Wait for all tasks
   ```
   The issue isn't about parallelism benefits (single partition means nothing 
to parallelize anyway), but about JoinSet::spawn() itself not working in the 
JNI execution environment.
   
   When I call `build_index` directly in JNI context, it hanged forever. I 
guess this might be caused by a deadlock by JoinSet::spawn().
   
   In a JNI context:
     1. When JoinSet::spawn() is called, it schedules new tasks to run on the 
runtime's worker threads
     2. Then join_all().await blocks the current thread waiting for those 
spawned tasks
     3. But if the current thread IS the only worker thread (or all workers are 
blocked), no one can execute the spawned tasks
     4. Deadlock: the current thread waits for tasks that need the current 
thread to execute
   
   It would be ideal to reuse build_index if there is a workaround for it, but 
I also think using a clean `build_index_sequential` makes sense because this is 
called in a distributed framework and the parallelism is provided and dictated 
by the distributed schedulings themselves.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to