alamb commented on code in PR #13527:
URL: https://github.com/apache/datafusion/pull/13527#discussion_r1899131349
##########
datafusion-examples/examples/composed_extension_codec.rs:
##########
@@ -71,8 +71,10 @@ async fn main() {
// deserialize proto back to execution plan
let runtime = ctx.runtime_env();
+ let state = ctx.state();
+ let config_options = state.config_options();
let result_exec_plan: Arc<dyn ExecutionPlan> = proto
- .try_into_physical_plan(&ctx, runtime.deref(), &composed_codec)
+ .try_into_physical_plan(&ctx, config_options, runtime.deref(),
&composed_codec)
Review Comment:
This is an API change, but I think it is required to thread the config
options through as each argument is specific
We could potentially improve the `try_into_physical_plan` API (as a follow
on PR) to make it easier to update the API in the future using a trait or
something like
https://github.com/apache/datafusion/blob/e99e02b9b9093ceb0c13a2dd32a2a89beba47930/datafusion/expr/src/expr_schema.rs#L39-L38
So this would look something like
```rust
pub trait ProtobufContext {
/// return a function registry
fn function_registry(&self) -> &dyn FunctionRegistry;
/// return the runtime env
fn runtime_env(&self) -> &RuntimeEnv;
/// return the config options
fn config_options(&self) -> &ConfigOptions;
/// return extension codec
fn extension_codec(&self) -> &dyn PhysicalExtensionCodec;
}
```
```rust
impl AsExecutionPlan for protobuf::PhysicalPlanNode {
...
fn try_into_physical_plan(
&self,
registry: &dyn FunctionRegistry,
config_options: &ConfigOptions,
runtime: &RuntimeEnv,
extension_codec: &dyn PhysicalExtensionCodec,
) -> Result<Arc<dyn ExecutionPlan>> {
##########
datafusion-examples/examples/pruning.rs:
##########
@@ -187,7 +188,9 @@ impl PruningStatistics for MyCatalog {
fn create_pruning_predicate(expr: Expr, schema: &SchemaRef) ->
PruningPredicate {
let df_schema = DFSchema::try_from(schema.as_ref().clone()).unwrap();
let props = ExecutionProps::new();
- let physical_expr = create_physical_expr(&expr, &df_schema,
&props).unwrap();
+ let config_options = ConfigOptions::default();
Review Comment:
Something we could potentially do to make this API slightly easier to use
might be to create a static default `ConfigOptions`
Something like
```rust
impl ConfigOptions {
/// returns a reference to default ConfigOptions
pub fn default_singleton() -> &'static ConfigOptions
}
```
This would then make it easier to return `&ConfigOptions` in various places
when only the default was needed
For example, then in `LocalCsvTableFunc` you could avoid having to thread
the `ConfigOptions` through as in that example having the actual config options
isn't important
##########
datafusion/physical-expr/src/scalar_function.rs:
##########
@@ -243,6 +292,7 @@ pub fn create_physical_expr(
Arc::new(fun.clone()),
input_phy_exprs.to_vec(),
return_type,
+ Arc::new(config_options.clone()),
Review Comment:
This line worries me -- it means each distinct scalar function in the plan
will get an entirely new copy of the ConfigOptions.
I think the `Arc` should be passed in as the argument like this (I realize
this will be a significant code change) so that the config options are copied
at most once per plan
```rust
/// Create a physical expression for the UDF.
pub fn create_physical_expr(
fun: &ScalarUDF,
input_phy_exprs: &[Arc<dyn PhysicalExpr>],
input_schema: &Schema,
args: &[Expr],
input_dfschema: &DFSchema,
config_options: &Arc<ConfigOptions>, // <--- I think this should be an
`Arc`
) -> Result<Arc<dyn PhysicalExpr>> {
```
I think that would also make it clearer that physical planning makes a read
only copy of the configuration (`Arc<ConfigOptions>`) that is then unchanged
during execution
##########
datafusion/core/src/datasource/listing/helpers.rs:
##########
@@ -241,6 +241,7 @@ async fn prune_partitions(
partitions: Vec<Partition>,
filters: &[Expr],
partition_cols: &[(String, DataType)],
+ ctx: &SessionState,
Review Comment:
I recommend passing `&ConfigOptions` directly here as the function only
needs the config, not the entire session state
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]