mbutrovich commented on code in PR #4277:
URL: https://github.com/apache/datafusion-comet/pull/4277#discussion_r3267419772


##########
native/spark-expr/src/math_funcs/modulo_expr.rs:
##########
@@ -63,6 +64,115 @@ pub fn spark_modulo(args: &[ColumnarValue], fail_on_error: 
bool) -> Result<Colum
     }
 }
 
+/// Spark-compliant pmod (positive modulo) function. Returns the positive 
remainder of division.
+/// If `fail_on_error` is true, returns an error on division by zero; 
otherwise returns `NULL`.
+pub fn spark_pmod(args: &[ColumnarValue], fail_on_error: bool) -> 
Result<ColumnarValue> {
+    if args.len() != 2 {
+        return exec_err!("pmod expects exactly two arguments");
+    }
+
+    let lhs = &args[0];
+    let rhs = &args[1];
+
+    let left_data_type = lhs.data_type();
+    let right_data_type = rhs.data_type();
+
+    if left_data_type.is_nested() {

Review Comment:
   The nested-type branch dispatches to `apply_cmp_for_nested(Operator::Modulo, 
...)`, but two things look off:
   
   1. `Pmod` is numeric-only in Spark, and `CometPmod.supportedDataType` 
already rejects non-numeric types. This branch should be unreachable.
   2. Even if it were reached, it would compute Modulo, not Pmod.
   
   Suggest dropping the whole `if left_data_type.is_nested()` block, or 
replacing the body with `internal_err!("spark_pmod does not support nested 
types")` to make the invariant explicit.



##########
native/spark-expr/src/math_funcs/modulo_expr.rs:
##########
@@ -63,6 +64,115 @@ pub fn spark_modulo(args: &[ColumnarValue], fail_on_error: 
bool) -> Result<Colum
     }
 }
 
+/// Spark-compliant pmod (positive modulo) function. Returns the positive 
remainder of division.
+/// If `fail_on_error` is true, returns an error on division by zero; 
otherwise returns `NULL`.
+pub fn spark_pmod(args: &[ColumnarValue], fail_on_error: bool) -> 
Result<ColumnarValue> {
+    if args.len() != 2 {
+        return exec_err!("pmod expects exactly two arguments");
+    }
+
+    let lhs = &args[0];
+    let rhs = &args[1];
+
+    let left_data_type = lhs.data_type();
+    let right_data_type = rhs.data_type();
+
+    if left_data_type.is_nested() {
+        if right_data_type != left_data_type {
+            return internal_err!("Type mismatch for spark pmod operation");
+        }
+        return apply_cmp_for_nested(Operator::Modulo, lhs, rhs);
+    }
+
+    let arrays = ColumnarValue::values_to_arrays(args)?;
+    let left = &arrays[0];
+    let right = &arrays[1];
+
+    match (|| -> std::result::Result<ArrayRef, arrow::error::ArrowError> {
+        let zero = ScalarValue::new_zero(&left_data_type)
+            .map_err(|e| 
arrow::error::ArrowError::ComputeError(e.to_string()))?
+            .to_array_of_size(left.len())
+            .map_err(|e| 
arrow::error::ArrowError::ComputeError(e.to_string()))?;
+        let result = rem(left, right)?;

Review Comment:
   Could the negative-zero incompatibility be eliminated by branching with 
`zip` instead of doing an unconditional add/rem?
   
   The current shape
   
   ```rust
   let result = rem(left, right)?;
   let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?;
   let plus = arrow::compute::kernels::zip::zip(&neg, right, &zero)?;
   let result = arrow::compute::kernels::numeric::add(&plus, &result)?;
   rem(&result, right)
   ```
   
   normalizes `-0.0` through `add(+0.0, -0.0) = +0.0`, which is the source of 
the documented strict-floating-point caveat. Spark's reference implementation 
is `if (r < 0) (r + n) % n else r`, i.e. the non-negative branch returns `r` 
untouched.
   
   A direct vectorized translation:
   
   ```rust
   let result = rem(left, right)?;
   let neg = arrow::compute::kernels::cmp::lt(&result, &zero)?;
   let adjusted = rem(&arrow::compute::kernels::numeric::add(&result, right)?, 
right)?;
   arrow::compute::kernels::zip::zip(&neg, &adjusted, &result)
   ```
   
   That preserves the original `r` for `r >= 0`, including `-0.0`, and matches 
Spark exactly. Quick sanity check on the interesting inputs:
   
   | input | `r` | `neg` | result |
   |---|---|---|---|
   | `pmod(-0.0, 3.0)` | `-0.0` | false | `-0.0` ✓ |
   | `pmod(NaN, 3.0)` | `NaN` | false (NaN cmp) | `NaN` ✓ |
   | `pmod(-10, 3)` | `-1` | true | `2` ✓ |
   | `pmod(-10, -3)` | `-1` | true | `-1` ✓ |
   
   If this works, `CometPmod` can drop `getIncompatibleReasons` / 
`getSupportLevel` and just be `Compatible()`.
   



##########
spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:
##########
@@ -284,6 +285,46 @@ object CometRemainder extends 
CometExpressionSerde[Remainder] with MathBase {
   }
 }
 
+object CometPmod extends CometExpressionSerde[Pmod] with MathBase {
+
+  private val negativeZeroReason =
+    "pmod may return 0.0 instead of -0.0 for floating-point types " +
+      "and will fall back to Spark when " +
+      s"${CometConf.COMET_EXEC_STRICT_FLOATING_POINT.key}=true. " +
+      CometConf.COMPAT_GUIDE
+
+  override def getIncompatibleReasons(): Seq[String] = Seq(negativeZeroReason)
+
+  override def getSupportLevel(expr: Pmod): SupportLevel = {
+    if (CometConf.COMET_EXEC_STRICT_FLOATING_POINT.get() &&
+      SupportLevel.containsFloatingPoint(expr.left.dataType)) {
+      Incompatible(Some(negativeZeroReason))
+    } else {
+      Compatible()
+    }
+  }
+
+  override def convert(

Review Comment:
   The PR description says "rejecting TRY eval mode", but `CometPmod.convert` 
doesn't have the `expr.evalMode == EvalMode.TRY` guard that `CometRemainder` 
has at line 270.
   
   There's no `try_pmod` in Spark's function registry, so the gap is unlikely 
to be reachable today. But matching `CometRemainder`'s shape keeps the serde 
defensive in case `Pmod` ever gets constructed with a `TRY` context, and aligns 
the code with the PR description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to