Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

via GitHub Fri, 04 Apr 2025 12:44:51 -0700


gabotechs commented on code in PR #15438:
URL: https://github.com/apache/datafusion/pull/15438#discussion_r2018708072



##########
datafusion/substrait/tests/cases/consumer_integration.rs:
##########
@@ -483,6 +483,34 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn test_multiple_joins() -> Result<()> {
+        let plan_str = test_plan_to_string("multiple_joins.json").await?;
+        println!("{}", plan_str);

Review Comment:
   We should be fine to remove this `println!` statement now



##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -1470,17 +1470,27 @@ impl ValuesFields {
 
 pub fn change_redundant_column(fields: &Fields) -> Vec<Field> {

Review Comment:
   This same file has a unit test for this specific function, it would be nice 
to extend it to something like this:
   
   <details><summary>change_redundant_column unit test</summary>
   
   ```rust
       #[test]
       fn test_change_redundant_column() -> Result<()> {
           let t1_field_1 = Field::new("a", DataType::Int32, false);
           let t2_field_1 = Field::new("a", DataType::Int32, false);
           let t2_field_3 = Field::new("a", DataType::Int32, false);
           let t1_field_2 = Field::new("b", DataType::Int32, false);
           let t2_field_2 = Field::new("b", DataType::Int32, false);
           let t2_field_4 = Field::new("a:1", DataType::Int32, false); // <- 
add this
   
           let field_vec = vec![
               t1_field_1, t2_field_1, t1_field_2, t2_field_2, t2_field_3, 
t2_field_4,
           ];
           let remove_redundant = 
change_redundant_column(&Fields::from(field_vec));
   
           assert_eq!(
               remove_redundant,
               vec![
                   Field::new("a", DataType::Int32, false),
                   Field::new("a:1", DataType::Int32, false),
                   Field::new("b", DataType::Int32, false),
                   Field::new("b:1", DataType::Int32, false),
                   Field::new("a:2", DataType::Int32, false),
                   Field::new("a:1:1", DataType::Int32, false), // <- and 
expect this
               ]
           );
           Ok(())
       }
   ```
   </summary>



##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -1470,17 +1470,27 @@ impl ValuesFields {
 
 pub fn change_redundant_column(fields: &Fields) -> Vec<Field> {
     let mut name_map = HashMap::new();
+    let mut seen: HashSet<String> = HashSet::new();

Review Comment:
   At first sight, it's a bit unclear why `name_map` is not sufficient for 
granting uniqueness the generated names. It could be useful for readers to 
leave a comment here with a brief explanation about why this is needed, 
something like
   ```rust
   // `name_map` tracks a mapping between a field name and the amount
   // of times it appeared. If a field appeared twice with the same name, the
   // count for that field will increment by one, and that count will be 
appended
   // to the field name in order to grant uniqueness.
   //
   // Some field names might already come to this function with a literal 
appended
   // count in its name, so there's still a chance of name collisions, for 
example,
   // if these three fields passed to this function: "col", "col" and "col:1".
   //
   // The `seen` HashSet accounts for this, tracking the produced fields are
   // truly unique.
   ```
   Just an idea from what I understood, maybe there's a better way of 
explaining it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

Reply via email to