gabotechs commented on code in PR #15438: URL: https://github.com/apache/datafusion/pull/15438#discussion_r2018708072
########## datafusion/substrait/tests/cases/consumer_integration.rs: ########## @@ -483,6 +483,34 @@ mod tests { Ok(()) } + #[tokio::test] + async fn test_multiple_joins() -> Result<()> { + let plan_str = test_plan_to_string("multiple_joins.json").await?; + println!("{}", plan_str); Review Comment: We should be fine to remove this `println!` statement now ########## datafusion/expr/src/logical_plan/builder.rs: ########## @@ -1470,17 +1470,27 @@ impl ValuesFields { pub fn change_redundant_column(fields: &Fields) -> Vec<Field> { Review Comment: This same file has a unit test for this specific function, it would be nice to extend it to something like this: <details><summary>change_redundant_column unit test</summary> ```rust #[test] fn test_change_redundant_column() -> Result<()> { let t1_field_1 = Field::new("a", DataType::Int32, false); let t2_field_1 = Field::new("a", DataType::Int32, false); let t2_field_3 = Field::new("a", DataType::Int32, false); let t1_field_2 = Field::new("b", DataType::Int32, false); let t2_field_2 = Field::new("b", DataType::Int32, false); let t2_field_4 = Field::new("a:1", DataType::Int32, false); // <- add this let field_vec = vec![ t1_field_1, t2_field_1, t1_field_2, t2_field_2, t2_field_3, t2_field_4, ]; let remove_redundant = change_redundant_column(&Fields::from(field_vec)); assert_eq!( remove_redundant, vec![ Field::new("a", DataType::Int32, false), Field::new("a:1", DataType::Int32, false), Field::new("b", DataType::Int32, false), Field::new("b:1", DataType::Int32, false), Field::new("a:2", DataType::Int32, false), Field::new("a:1:1", DataType::Int32, false), // <- and expect this ] ); Ok(()) } ``` </summary> ########## datafusion/expr/src/logical_plan/builder.rs: ########## @@ -1470,17 +1470,27 @@ impl ValuesFields { pub fn change_redundant_column(fields: &Fields) -> Vec<Field> { let mut name_map = HashMap::new(); + let mut seen: HashSet<String> = HashSet::new(); Review Comment: At first sight, it's a bit unclear why `name_map` is not sufficient for granting uniqueness the generated names. It could be useful for readers to leave a comment here with a brief explanation about why this is needed, something like ```rust // `name_map` tracks a mapping between a field name and the amount // of times it appeared. If a field appeared twice with the same name, the // count for that field will increment by one, and that count will be appended // to the field name in order to grant uniqueness. // // Some field names might already come to this function with a literal appended // count in its name, so there's still a chance of name collisions, for example, // if these three fields passed to this function: "col", "col" and "col:1". // // The `seen` HashSet accounts for this, tracking the produced fields are // truly unique. ``` Just an idea from what I understood, maybe there's a better way of explaining it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org