linhr opened a new issue, #20547:
URL: https://github.com/apache/datafusion/issues/20547

   ### Is your feature request related to a problem or challenge?
   
   Right now `DefaultPhysicalPlanner::map_logical_node_to_physical()` calls 
`source_as_provider()` and returns an error if the `TableSource` inside 
`LogicalPlan::TableScan` is not a `DefaultTableSource` (which wraps a 
`TableProvider`).
   
   `TableSource` was introduced so that the logical planning doesn't have a 
dependency on `TableProvider`. `TableProvider` has a broader set of 
responsibilities that involve both logical planning and physical execution 
(`TableProvider::scan()`).
   
   However, I feel `TableSource` itself is a valuable abstraction for logical 
planning so it would be good if the user can customize the physical planning 
for it. In some use cases, the user may want to implement `TableSource` as 
purely a logical representation of data sources, without coupling the scanning 
logic in the same struct. If we allow custom `TableSource` for 
`LogicalPlan::TableScan`, custom data sources can benefit from logical 
optimization that involves filter pushdown, projection pruning, and fetch limit 
push down.
   
   ### Describe the solution you'd like
   
   A trait method `ExtensionPlanner::plan_table_scan()` would be helpful. The 
user can inject physical planning logic for `LogicalPlan::TableScan` containing 
custom `TableSource` implementations. If none of the registered extension 
planners returns the physical plan, we will fall back to the existing logic 
that assumes the `TableSource` wraps a `TableProvider` and continues the 
planning from there.
   
   ### Describe alternatives you've considered
   
   It is possible to work around this problem in the current setup. The idea is 
to first convert `LogicalPlan::TableScan` to `LogicalPlan::Extension` by 
traversing the logical plan tree, and then implement an `ExtensionPlanner` that 
converts the logical extension to the physical plan node. This is more 
boilerplate code to write (a `UserDefinedLogicalNode` that serves as the 
"bridge" and a logical plan rewriter).
   
   ### Additional context
   
   `source_as_provider()` is also used for logical plan Protobuf codec. For 
consistency, it might be good to support `TableSource`s that are not 
`TableProvider`s in logical plan Protobuf codec as well, but that would require 
some breaking changes in `LogicalExtensionCodec`, as well as adding the FFI 
support for `TableSource`. This is out of scope for this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to