alamb commented on code in PR #16578:
URL: https://github.com/apache/datafusion/pull/16578#discussion_r2195957972


##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -1681,6 +1681,40 @@ impl DataFrame {
         })
     }
 
+    /// Calculate the distinct intersection of two [`DataFrame`]s.  The two 
[`DataFrame`]s must have exactly the same schema

Review Comment:
   My concern is that this new API can't be expanded
   
   What about following the example of 
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.unnest_columns_with_options
   
   So like 
   ```rust
   struct SetOptions {
     /// Should duplicates be removed from the output
     distinct: bool
   }
   
   ...
   impl DataFrame {
       pub fn intersect_distinct_with_options(self, dataframe: DataFrame, 
options: SetOptions) -> Result<DataFrame> {
   ...
   }
   ```



##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -1681,6 +1681,40 @@ impl DataFrame {
         })
     }
 
+    /// Calculate the distinct intersection of two [`DataFrame`]s.  The two 
[`DataFrame`]s must have exactly the same schema

Review Comment:
   I think we should do this as a follow on PR perhaps -- as it would be a net 
new feature and adding these functions is useful in its own right. 
   
   Just using the existing functionality seems very reasonable to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to