alamb commented on code in PR #18923:
URL: https://github.com/apache/datafusion/pull/18923#discussion_r2575091075


##########
datafusion/common/src/pruning.rs:
##########
@@ -560,6 +560,79 @@ mod tests {
         assert_eq!(partition_stats.num_containers(), 2);
     }
 
+    #[test]
+    fn test_partition_pruning_statistics_multiple_positive_values() {
+        let partition_values = vec![
+            vec![ScalarValue::from(1i32), ScalarValue::from(2i32)],
+            vec![ScalarValue::from(3i32), ScalarValue::from(4i32)],
+        ];
+        let partition_fields = vec![
+            Arc::new(Field::new("a", DataType::Int32, false)),
+            Arc::new(Field::new("b", DataType::Int32, false)),
+        ];
+        let partition_stats =
+            PartitionPruningStatistics::try_new(partition_values, 
partition_fields)
+                .unwrap();
+
+        let column_a = Column::new_unqualified("a");
+
+        let values = HashSet::from([ScalarValue::from(1i32), 
ScalarValue::from(3i32)]);

Review Comment:
   This doesn't seem right to me 🤔 
   
   According to 
   
https://docs.rs/datafusion/latest/datafusion/common/pruning/trait.PruningStatistics.html#tymethod.contained
   
   > The returned array has one row for each container, with the following 
meanings:
   > 
   > * true if the values in column ONLY contain values from values
   > * false if the values in column are NOT ANY of values
   > * null if the neither of the above holds or is unknown.
   
   
   This test I think has `a` with values `1` and `2`. So the result of 
`contains` for values `(1,3)` should be `NULL` as `3` is not in values...
   
   Maybe I am missing something here



##########
datafusion/common/src/pruning.rs:
##########
@@ -560,6 +560,79 @@ mod tests {
         assert_eq!(partition_stats.num_containers(), 2);
     }
 
+    #[test]
+    fn test_partition_pruning_statistics_multiple_positive_values() {
+        let partition_values = vec![
+            vec![ScalarValue::from(1i32), ScalarValue::from(2i32)],
+            vec![ScalarValue::from(3i32), ScalarValue::from(4i32)],
+        ];
+        let partition_fields = vec![
+            Arc::new(Field::new("a", DataType::Int32, false)),
+            Arc::new(Field::new("b", DataType::Int32, false)),
+        ];
+        let partition_stats =
+            PartitionPruningStatistics::try_new(partition_values, 
partition_fields)
+                .unwrap();
+
+        let column_a = Column::new_unqualified("a");
+
+        let values = HashSet::from([ScalarValue::from(1i32), 
ScalarValue::from(3i32)]);

Review Comment:
   Given no other tests fail, we clearly have some sort of test coverage gap



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to