bert-beyondloops opened a new issue, #16623:
URL: https://github.com/apache/datafusion/issues/16623

   ### Describe the bug
   
   When creating the unnest logical plan directly, the projection push down 
optimiser does not eliminate unused columns.
   
   Example plan : 
   
   CREATE TABLE foo (list_column ARRAY<INTEGER>,  int_a INTEGER,  int_b 
INTEGER) AS VALUES (MAKE_ARRAY(1, 2, 3), 1,1); 
   
   DESCRIBE SELECT unnest(list_column), int_a FROM foo;
   
   From within a SQL context, the projection pushdown is working correctly and 
eliminates the "int_b" column on the table scan
   
   ```
   
+---------------+---------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                         |
   
+---------------+---------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: __unnest_placeholder(foo.list_column,depth=1) 
AS UNNEST(foo.list_column), foo.int_a                     |
   |               |   Unnest: 
lists[__unnest_placeholder(foo.list_column)|depth=1] structs[]                  
                          |
   |               |     Projection: foo.list_column AS 
__unnest_placeholder(foo.list_column), foo.int_a                                
 |
   |               |       TableScan: foo projection=[list_column, int_a]       
                                                         |
   | physical_plan | ProjectionExec: 
expr=[__unnest_placeholder(foo.list_column,depth=1)@0 as 
UNNEST(foo.list_column), int_a@1 as int_a] |
   |               |   UnnestExec                                               
                                                         |
   |               |     ProjectionExec: expr=[list_column@0 as 
__unnest_placeholder(foo.list_column), int_a@1 as int_a]                 |
   |               |       DataSourceExec: partitions=1, partition_sizes=[1]    
                                                         |
   |               |                                                            
                                                         |
   
+---------------+---------------------------------------------------------------------------------------------------------------------+
   ```
   
   Creating a logical plan directly via for example the LogicalPlanBuilder :
   
   ```
   #[tokio::test]
       async fn test_unnest_push_down_projection_via_logical_plan() -> 
Result<(), Box<dyn Error>> {
           let ctx = SessionContext::new();
           let df = ctx.sql("CREATE TABLE foo (list_column ARRAY<INTEGER>,  
int_a INTEGER,  int_b INTEGER) AS VALUES (MAKE_ARRAY(1, 2, 3), 1,1);").await?;
           let batches = df.collect().await?;
           pretty::print_batches(&batches)?;
   
           let table_source = 
provider_as_source(ctx.table_provider("foo").await?);
           let builder = LogicalPlanBuilder::scan("foo", table_source, None)?;
           let builder = builder.unnest_column(Column::from("list_column"))?;
           let builder = builder.project(vec![col("list_column"), 
col("int_a")])?;
   
           let plan = builder.build()?;
           let plan = ctx.state().optimize(&plan)?;
           println!("{}", plan);
   
           Ok(())
       }
   ```
   
   boils down to the following optimized logical plan : 
   
   ```
   Projection: list_column, foo.int_a
     Unnest: lists[foo.list_column|depth=1] structs[]
       TableScan: foo projection=[list_column, int_a, int_b]
   ```
   
   => Column "int_b" is not projected away on the table scan.
   
   
   ### To Reproduce
   
    ```
   #[tokio::test]
       async fn test_unnest_push_down_projection_via_logical_plan() -> 
Result<(), Box<dyn Error>> {
           let ctx = SessionContext::new();
           let df = ctx.sql("CREATE TABLE foo (list_column ARRAY<INTEGER>,  
int_a INTEGER,  int_b INTEGER) AS VALUES (MAKE_ARRAY(1, 2, 3), 1,1);").await?;
           let batches = df.collect().await?;
           pretty::print_batches(&batches)?;
   
           let table_source = 
provider_as_source(ctx.table_provider("foo").await?);
           let builder = LogicalPlanBuilder::scan("foo", table_source, None)?;
           let builder = builder.unnest_column(Column::from("list_column"))?;
           let builder = builder.project(vec![col("list_column"), 
col("int_a")])?;
   
           let plan = builder.build()?;
           let plan = ctx.state().optimize(&plan)?;
           println!("{}", plan);
   
           Ok(())
       }
   `
   
   
   ### Expected behavior
   
   
   Even when creating an unnest logica plan directly, it should project away 
unneeded columns.
   
   ### Additional context
   
   It seems the projection is a one time effort currently done on the level of 
SQL to logical plan translation. 
   
   See select.rs:
   
   `let plan = LogicalPlanBuilder::from(intermediate_plan)
       .project(inner_projection_exprs)?
       .unnest_columns_with_options(unnest_col_vec, unnest_options)?
       .build()?;`
   
   
   whereas in the actual optimization rule, there is no evaluation anymore for 
the unnest logical plan.
   
   optimize_projections/mod.rs:
   
   
   ```
   LogicalPlan::Unnest(Unnest {
       dependency_indices, ..
   }) => {
       vec![RequiredIndices::new_from_indices(
           dependency_indices.clone(),
       )]
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to