alamb commented on code in PR #13527: URL: https://github.com/apache/datafusion/pull/13527#discussion_r1899131349
########## datafusion-examples/examples/composed_extension_codec.rs: ########## @@ -71,8 +71,10 @@ async fn main() { // deserialize proto back to execution plan let runtime = ctx.runtime_env(); + let state = ctx.state(); + let config_options = state.config_options(); let result_exec_plan: Arc<dyn ExecutionPlan> = proto - .try_into_physical_plan(&ctx, runtime.deref(), &composed_codec) + .try_into_physical_plan(&ctx, config_options, runtime.deref(), &composed_codec) Review Comment: This is an API change, but I think it is required to thread the config options through as each argument is specific We could potentially improve the `try_into_physical_plan` API (as a follow on PR) to make it easier to update the API in the future using a trait or something like https://github.com/apache/datafusion/blob/e99e02b9b9093ceb0c13a2dd32a2a89beba47930/datafusion/expr/src/expr_schema.rs#L39-L38 So this would look something like ```rust pub trait ProtobufContext { /// return a function registry fn function_registry(&self) -> &dyn FunctionRegistry; /// return the runtime env fn runtime_env(&self) -> &RuntimeEnv; /// return the config options fn config_options(&self) -> &ConfigOptions; /// return extension codec fn extension_codec(&self) -> &dyn PhysicalExtensionCodec; } ``` ```rust impl AsExecutionPlan for protobuf::PhysicalPlanNode { ... fn try_into_physical_plan( &self, registry: &dyn FunctionRegistry, config_options: &ConfigOptions, runtime: &RuntimeEnv, extension_codec: &dyn PhysicalExtensionCodec, ) -> Result<Arc<dyn ExecutionPlan>> { ########## datafusion-examples/examples/pruning.rs: ########## @@ -187,7 +188,9 @@ impl PruningStatistics for MyCatalog { fn create_pruning_predicate(expr: Expr, schema: &SchemaRef) -> PruningPredicate { let df_schema = DFSchema::try_from(schema.as_ref().clone()).unwrap(); let props = ExecutionProps::new(); - let physical_expr = create_physical_expr(&expr, &df_schema, &props).unwrap(); + let config_options = ConfigOptions::default(); Review Comment: Something we could potentially do to make this API slightly easier to use might be to create a static default `ConfigOptions` Something like ```rust impl ConfigOptions { /// returns a reference to default ConfigOptions pub fn default_singleton() -> &'static ConfigOptions } ``` This would then make it easier to return `&ConfigOptions` in various places when only the default was needed For example, then in `LocalCsvTableFunc` you could avoid having to thread the `ConfigOptions` through as in that example having the actual config options isn't important ########## datafusion/physical-expr/src/scalar_function.rs: ########## @@ -243,6 +292,7 @@ pub fn create_physical_expr( Arc::new(fun.clone()), input_phy_exprs.to_vec(), return_type, + Arc::new(config_options.clone()), Review Comment: This line worries me -- it means each distinct scalar function in the plan will get an entirely new copy of the ConfigOptions. I think the `Arc` should be passed in as the argument like this (I realize this will be a significant code change) so that the config options are copied at most once per plan ```rust /// Create a physical expression for the UDF. pub fn create_physical_expr( fun: &ScalarUDF, input_phy_exprs: &[Arc<dyn PhysicalExpr>], input_schema: &Schema, args: &[Expr], input_dfschema: &DFSchema, config_options: &Arc<ConfigOptions>, // <--- I think this should be an `Arc` ) -> Result<Arc<dyn PhysicalExpr>> { ``` I think that would also make it clearer that physical planning makes a read only copy of the configuration (`Arc<ConfigOptions>`) that is then unchanged during execution ########## datafusion/core/src/datasource/listing/helpers.rs: ########## @@ -241,6 +241,7 @@ async fn prune_partitions( partitions: Vec<Partition>, filters: &[Expr], partition_cols: &[(String, DataType)], + ctx: &SessionState, Review Comment: I recommend passing `&ConfigOptions` directly here as the function only needs the config, not the entire session state -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org