alamb commented on code in PR #13527:
URL: https://github.com/apache/datafusion/pull/13527#discussion_r1899131349


##########
datafusion-examples/examples/composed_extension_codec.rs:
##########
@@ -71,8 +71,10 @@ async fn main() {
 
     // deserialize proto back to execution plan
     let runtime = ctx.runtime_env();
+    let state = ctx.state();
+    let config_options = state.config_options();
     let result_exec_plan: Arc<dyn ExecutionPlan> = proto
-        .try_into_physical_plan(&ctx, runtime.deref(), &composed_codec)
+        .try_into_physical_plan(&ctx, config_options, runtime.deref(), 
&composed_codec)

Review Comment:
   This is an API change, but I think it is required to thread the config 
options through as each argument is specific
   
   We could potentially improve the `try_into_physical_plan` API (as a follow 
on PR) to make it easier to update the API in the future using a trait or 
something like
   
   
https://github.com/apache/datafusion/blob/e99e02b9b9093ceb0c13a2dd32a2a89beba47930/datafusion/expr/src/expr_schema.rs#L39-L38
   
   
   
   
   So this would look something like 
   ```rust
   pub trait ProtobufContext { 
     /// return a function registry
     fn function_registry(&self) -> &dyn FunctionRegistry;
     /// return the runtime env
     fn runtime_env(&self) -> &RuntimeEnv;
     /// return the config options
     fn config_options(&self) -> &ConfigOptions;
     /// return extension codec
     fn extension_codec(&self) -> &dyn PhysicalExtensionCodec;
   }
   ```
   
   ```rust
   impl AsExecutionPlan for protobuf::PhysicalPlanNode {
   ...
       fn try_into_physical_plan(
           &self,
           registry: &dyn FunctionRegistry,
           config_options: &ConfigOptions,
           runtime: &RuntimeEnv,
           extension_codec: &dyn PhysicalExtensionCodec,
       ) -> Result<Arc<dyn ExecutionPlan>> {
   



##########
datafusion-examples/examples/pruning.rs:
##########
@@ -187,7 +188,9 @@ impl PruningStatistics for MyCatalog {
 fn create_pruning_predicate(expr: Expr, schema: &SchemaRef) -> 
PruningPredicate {
     let df_schema = DFSchema::try_from(schema.as_ref().clone()).unwrap();
     let props = ExecutionProps::new();
-    let physical_expr = create_physical_expr(&expr, &df_schema, 
&props).unwrap();
+    let config_options = ConfigOptions::default();

Review Comment:
   Something we could potentially do to make this API slightly easier to use 
might be to create a static default `ConfigOptions` 
   
   Something like
   ```rust
   impl ConfigOptions {
     /// returns a reference to default ConfigOptions
     pub fn default_singleton() -> &'static ConfigOptions 
   }
   ```
   
   This would then make it easier to return `&ConfigOptions` in various places 
when only the default was needed
   
   For example, then in `LocalCsvTableFunc` you could avoid having to thread 
the `ConfigOptions` through as in that example having the actual config options 
isn't important



##########
datafusion/physical-expr/src/scalar_function.rs:
##########
@@ -243,6 +292,7 @@ pub fn create_physical_expr(
             Arc::new(fun.clone()),
             input_phy_exprs.to_vec(),
             return_type,
+            Arc::new(config_options.clone()),

Review Comment:
   This line worries me -- it means each distinct scalar function in the plan 
will get an entirely new copy of the ConfigOptions.
   
   I think the `Arc` should be passed in as the argument like this (I realize 
this will be a significant code change) so that the config options are copied 
at most once per plan
   
   ```rust
   /// Create a physical expression for the UDF.
   pub fn create_physical_expr(
       fun: &ScalarUDF,
       input_phy_exprs: &[Arc<dyn PhysicalExpr>],
       input_schema: &Schema,
       args: &[Expr],
       input_dfschema: &DFSchema,
       config_options: &Arc<ConfigOptions>, // <--- I think this should be an 
`Arc`
   ) -> Result<Arc<dyn PhysicalExpr>> {
   ```
   
   I think that would also make it clearer that physical planning makes a read 
only copy of the configuration (`Arc<ConfigOptions>`) that is then unchanged 
during execution



##########
datafusion/core/src/datasource/listing/helpers.rs:
##########
@@ -241,6 +241,7 @@ async fn prune_partitions(
     partitions: Vec<Partition>,
     filters: &[Expr],
     partition_cols: &[(String, DataType)],
+    ctx: &SessionState,

Review Comment:
   I recommend passing `&ConfigOptions` directly here as the function only 
needs the config, not the entire session state



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to