2010YOUY01 commented on code in PR #19587:
URL: https://github.com/apache/datafusion/pull/19587#discussion_r2656071853


##########
datafusion/common/src/utils/mod.rs:
##########
@@ -1026,22 +1026,26 @@ mod tests {
                 ScalarValue::Int32(Some(2)),
                 Null,
                 ScalarValue::Int32(Some(0)),
-            ] < vec![
+            ]
+            .partial_cmp(&vec![
                 ScalarValue::Int32(Some(2)),
                 Null,
                 ScalarValue::Int32(Some(1)),
-            ]
+            ])
+            .is_none()

Review Comment:
   TLDR: I suggest we follow the PostgreSQL's behavior and return true here.
   
   By definition it should return Null
   SQL Null behavior reference: 
https://github.com/apache/datafusion/blob/b818f93416d18d06374a0707f5ef571f8a384070/datafusion/pruning/src/pruning_predicate.rs#L113
   
   However postgres and DuckDB all has 'Null equals Null' behavior if Null is 
inside a composite type
   
   ```sh
   postgres=# SELECT ARRAY[2, NULL, 0] < ARRAY[2, NULL, 1];
    ?column?
   ----------
    t
   (1 row)
   
   D SELECT [2, NULL, 0] < [2, NULL, 1] AS result;
   ┌─────────┐
   │ result  │
   │ boolean │
   ├─────────┤
   │ true    │
   └─────────┘
   ```
   
   Postgres explains the rationale here
   
https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON
   I’ve read that section three times now, and I’ll be honest — I still have no 
idea what they’re talking about 😅 
   
   DuckDB said they're following Postgres behavior
   https://duckdb.org/docs/stable/sql/data_types/list#comparison-and-ordering



##########
datafusion/common/src/scalar/mod.rs:
##########
@@ -723,7 +727,7 @@ impl PartialOrd for ScalarValue {
                 if k1 == k2 { v1.partial_cmp(v2) } else { None }
             }
             (Dictionary(_, _), _) => None,
-            (Null, Null) => Some(Ordering::Equal),
+            // Null is handled by the early return above, but we need this for 
exhaustiveness

Review Comment:
   should we do something like
   ```rust
   (Null, Null) | (Null, _) | (_, Null) => unreachable!("Nulls are already 
handled before entering this matching arm
   ```
   to be more defensive



##########
datafusion/common/src/scalar/mod.rs:
##########
@@ -5760,10 +5764,9 @@ mod tests {
             .unwrap(),
             Ordering::Less
         );
-        assert_eq!(
+        assert!(
             ScalarValue::try_cmp(&ScalarValue::Int32(None), 
&ScalarValue::Int32(Some(2)))

Review Comment:
   It would be great to update the doc comments for `try_cmp`, now it only says 
it errors for incompatible types, but it's also throwing error for input nulls 
after the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to