findepi commented on code in PR #17476:
URL: https://github.com/apache/datafusion/pull/17476#discussion_r2335808457


##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -563,11 +563,14 @@ fn estimate_inner_join_cardinality(
         .iter()
         .zip(right_stats.column_statistics.iter())
     {
-        // Break if any of statistics bounds are undefined
-        if left_stat.min_value.get_value().is_none()
-            || left_stat.max_value.get_value().is_none()
-            || right_stat.min_value.get_value().is_none()
-            || right_stat.max_value.get_value().is_none()
+        // Break if we don't have enough information to calculate a distinct 
count
+        // If distinct_count isn't provided directly, we need min and max to 
be provided
+        if (left_stat.distinct_count.get_value().is_none()
+            && (left_stat.min_value.get_value().is_none()
+                || left_stat.max_value.get_value().is_none()))
+            || (right_stat.distinct_count.get_value().is_none()
+                && (right_stat.min_value.get_value().is_none()
+                    || right_stat.max_value.get_value().is_none()))

Review Comment:
   Would this be easier to read?
   
   
   
   
   ```suggestion
           let has_enough_info = |stat: &ColumnStatistics| {
               stat.distinct_count.get_value().is_some()
                   || (stat.min_value.get_value().is_some()
                       && stat.max_value.get_value().is_some())
           };
           if !has_enough_info(left_stat) || !has_enough_info(right_stat)
   ```



##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -2016,20 +2019,20 @@ mod tests {
             ),
             // When we have distinct count.
             (
-                (10, Inexact(1), Inexact(10), Inexact(10), Absent),
-                (10, Inexact(1), Inexact(10), Inexact(10), Absent),

Review Comment:
   Why change existing test cases?
   Maybe let's just add new ones?



##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -2071,18 +2074,13 @@ mod tests {
             ),
             // No min or max (or both).
             (
-                (10, Absent, Absent, Inexact(3), Absent),
-                (10, Absent, Absent, Inexact(3), Absent),
-                None,
-            ),
-            (
-                (10, Inexact(2), Absent, Inexact(3), Absent),
-                (10, Absent, Inexact(5), Inexact(3), Absent),
+                (10, Absent, Absent, Absent, Absent),
+                (10, Absent, Absent, Absent, Absent),

Review Comment:
   This became exact same as "No column level stats." input (one above).
   
   Let's not take changing inputs in existing test cases lightly.
   If they are mislabelled, perhaps let's just fix the labels 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to