timsaucer opened a new issue, #16312:
URL: https://github.com/apache/datafusion/issues/16312

   Please see the discussion in the original post below.
   
   Some users wish to spawn tasks during calls like `execute` and others. For 
pure rust implementations without FFI this isn't a problem. However when using 
the FFI layer, such as with `datafusion-python`, we can run into problems where 
we are not in the correct runtime for spawning tasks. Currently we enter the 
runtime during the calls to `poll_next` in the `FFI_RecordBatchStream`. We need 
to identify all of the other places we should enter the runtime when being 
called from foreign code.
   
   This patch resolved the problem described in [the 
discussion](https://github.com/apache/datafusion/discussions/15691) and [this 
issue](https://github.com/lancedb/lance/issues/3953). 
   
   ```
   diff --git a/datafusion/ffi/src/execution_plan.rs 
b/datafusion/ffi/src/execution_plan.rs
   index 00602474d..d6b292450 100644
   --- a/datafusion/ffi/src/execution_plan.rs
   +++ b/datafusion/ffi/src/execution_plan.rs
   @@ -112,6 +112,7 @@ unsafe extern "C" fn execute_fn_wrapper(
        let plan = &(*private_data).plan;
        let ctx = &(*private_data).context;
        let runtime = (*private_data).runtime.clone();
   +    let _guard = runtime.as_ref().map(|rt| rt.enter());
   
        rresult!(plan
            .execute(partition, Arc::clone(ctx))
   ```
   
   
   ### Discussed in https://github.com/apache/datafusion/discussions/15691
   
   <div type='discussions-op-text'>
   
   <sup>Originally posted by **westonpace** April 11, 2025</sup>
   A contributor of ours (lance) hit a bit of a snag implementing a foreign 
table provider for datafusion python.  It turns out the issue is that we spawn 
a background thread in `ExecutionPlan::execute`.  The foreign table provider 
code re-attaches the tokio runtime when it polls the returned stream but it 
doesn't attach it for the call to `execute`.  I think there are two ways this 
could be resolved and wanted to see if anyone had strong opinions one way or 
the other.
   
   1. I could modify my execution plan implementation (in Lance) so that the 
background thread is not spawned until it is polled for the first time.
   
   2. We could enter the foreign runtime before calling the execute function in 
datafusion
   
   I guess the more general question is:
   
   Is there an expectation that the `ExecutionPlan` streams do no background 
work until first polled?</div>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to