Hello, I hope this is the right place to ask this.
While working on a project based on arrow-datafusion, I came across some weird behavior where a projection did not get eliminated as expected, thus breaking a custom optimizer rule's assumption (into which I won't go further, as it's not important for this question). Specifically, I found an execution plan like: Projection [col1, col2] TableScan projection=[col1, col2, col3], full_filters=[ ... col3 ...] to not be simplified to: TableScan projection=[col1, col2], full_filters=[... col3 ...] This does seem to be intentional, as I have found this snippet<https://github.com/apache/arrow-datafusion/blob/c97048d178594b10b813c6bcd1543f157db4ba3f/datafusion/optimizer/src/push_down_projection.rs#L174> in the optimizer rule: ... LogicalPlan::TableScan(scan) if !scan.projected_schema.fields().is_empty() => { let mut used_columns: HashSet<Column> = HashSet::new(); // filter expr may not exist in expr in projection. // like: TableScan: t1 projection=[bool_col, int_col], full_filters=[t1.id = Int32(1)] // projection=[bool_col, int_col] don't contain `ti.id`. exprlist_to_columns(&scan.filters, &mut used_columns)?; ... However, the comment does not explain why we need to keep the extra projection and return the extra column - after all, the filters inside of the scan are internal to that scan, and should not affect the execution plan, right? I am looking forward to any opinions. Best regards, Markus Appel