cisaacson opened a new issue, #11193:
URL: https://github.com/apache/datafusion/issues/11193

   ### Is your feature request related to a problem or challenge?
   
   We need the ability to get the `TaskContext.task_id` any place where a 
Custom Data Source is invoked. As it stands currently, the `state: 
&SessionState` is available in `TableProvider.scan` and `task_ctx: 
Arc<TaskContext>` is available in `ExecutionPlan.execute`, but not in the 
`supports_filters_pushdown`. This prohibits per-query customization or tracking 
of external state in this method. For example if there are 3 `filters` for a 
custom table, and 10 are possible, we need to be able to choose the best one at 
runtime.
   
   Further, the `task_id` should always be available by passing the 
`TaskContext` or from `SessionState` to keep things consistent.
   
   In trying to implement this it proved infeasible because 
`supports_filters_pushdown` is in 2 interfaces in 2 separate crates: 
`TableProvider` (in `core`) and `TableSource` (in `expr`). It is not possible 
to add `state: &SessionState` to the `TableSource` implementation as it cannot 
access the `core` crate, a cyclic dependency occurs the way it is now. This was 
intentional to make `LogicalPlan` separable, which makes sense, but preventing 
this type of enhancement.
   
   
   ### Describe the solution you'd like
   
   Add `&SessionState` or minimally `TaskContext` in every pertinent method for 
per-query specific processing in a custom data source. 
   
   A possible way to solve this is to make a new `datafusion-traits` crate, and 
to move `SessionState` and other common items to `datafusion-common`, such that 
these components are used by `core` and `expr`. It will make some components 
available in `expr` that are not strictly necessary, but I think that is a good 
trade-off. This work could be combined with other efforts to break `core` into 
more sub-crates, that would make DataFusion much more flexible overall. 
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   Restructuring crates in a project of this size will be a lot of work, but I 
believe the benefit will be there. There are other issues that also would 
benefit. I would recommended a separate restructure ticket that can be reviewed 
before any implementation is attempted. In addition then this would need to be 
implemented by multiple contributors, it will inevitably cause a lot of 
temporary breakage and retesting will also be required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to