timsaucer opened a new issue, #16312: URL: https://github.com/apache/datafusion/issues/16312
Please see the discussion in the original post below. Some users wish to spawn tasks during calls like `execute` and others. For pure rust implementations without FFI this isn't a problem. However when using the FFI layer, such as with `datafusion-python`, we can run into problems where we are not in the correct runtime for spawning tasks. Currently we enter the runtime during the calls to `poll_next` in the `FFI_RecordBatchStream`. We need to identify all of the other places we should enter the runtime when being called from foreign code. This patch resolved the problem described in [the discussion](https://github.com/apache/datafusion/discussions/15691) and [this issue](https://github.com/lancedb/lance/issues/3953). ``` diff --git a/datafusion/ffi/src/execution_plan.rs b/datafusion/ffi/src/execution_plan.rs index 00602474d..d6b292450 100644 --- a/datafusion/ffi/src/execution_plan.rs +++ b/datafusion/ffi/src/execution_plan.rs @@ -112,6 +112,7 @@ unsafe extern "C" fn execute_fn_wrapper( let plan = &(*private_data).plan; let ctx = &(*private_data).context; let runtime = (*private_data).runtime.clone(); + let _guard = runtime.as_ref().map(|rt| rt.enter()); rresult!(plan .execute(partition, Arc::clone(ctx)) ``` ### Discussed in https://github.com/apache/datafusion/discussions/15691 <div type='discussions-op-text'> <sup>Originally posted by **westonpace** April 11, 2025</sup> A contributor of ours (lance) hit a bit of a snag implementing a foreign table provider for datafusion python. It turns out the issue is that we spawn a background thread in `ExecutionPlan::execute`. The foreign table provider code re-attaches the tokio runtime when it polls the returned stream but it doesn't attach it for the call to `execute`. I think there are two ways this could be resolved and wanted to see if anyone had strong opinions one way or the other. 1. I could modify my execution plan implementation (in Lance) so that the background thread is not spawned until it is polled for the first time. 2. We could enter the foreign runtime before calling the execute function in datafusion I guess the more general question is: Is there an expectation that the `ExecutionPlan` streams do no background work until first polled?</div> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org