alamb commented on code in PR #11534:
URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683479192
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -86,13 +108,35 @@ impl CaseExpr {
when_then_expr: Vec<WhenThen>,
else_expr: Option<Arc<dyn PhysicalExpr>>,
) -> Result<Self> {
+ // normalize null literals to None in the else_expr (this already
happens
+ // during SQL planning, but not necessarily for other use cases)
+ let else_expr = match &else_expr {
+ Some(e) => match e.as_any().downcast_ref::<Literal>() {
+ Some(lit) if lit.value().is_null() => None,
+ _ => else_expr,
+ },
+ _ => else_expr,
+ };
+
if when_then_expr.is_empty() {
exec_err!("There must be at least one WHEN clause")
} else {
+ let eval_method = if expr.is_some() {
+ EvalMethod::WithExpression
+ } else if when_then_expr.len() == 1
Review Comment:
I was thinking about this optimization and I think it would be valid for any
CASE expression that had a NULL ELSE, not just column
So in other words, I think you could remove the
`when_then_expr[0].1.as_any().is::<Column>()` check and this would still work
fine
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -256,6 +300,36 @@ impl CaseExpr {
Ok(ColumnarValue::Array(current_value))
}
+
+ /// This function evaluates the specialized case of:
+ ///
+ /// CASE WHEN condition THEN column
+ /// [ELSE NULL]
+ /// END
+ fn case_column_or_null(&self, batch: &RecordBatch) ->
Result<ColumnarValue> {
+ let when_expr = &self.when_then_expr[0].0;
+ let then_expr = &self.when_then_expr[0].1;
+ if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? {
+ let bit_mask = bit_mask
+ .as_any()
+ .downcast_ref::<BooleanArray>()
+ .expect("predicate should evaluate to a boolean array");
+ // invert the bitmask
+ let bit_mask = not(bit_mask)?;
+ match then_expr.evaluate(batch)? {
+ ColumnarValue::Array(array) => {
+ Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?))
+ }
+ ColumnarValue::Scalar(_) => Err(DataFusionError::Execution(
Review Comment:
this might be better as an internal error
You could also use the macros like
```rust
internal_err!("expression did not evaluate to an array")
```
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -998,6 +1080,53 @@ mod tests {
Ok(())
}
+ #[test]
Review Comment:
I think it would be good to make sure we had a `slt` level test to cover
this as well,
Maybe in
https://github.com/apache/datafusion/blob/382bf4f3c7a730828684b9e4ce01369b89717e19/datafusion/sqllogictest/test_files/expr.slt
Or we could start adding a file just for CASE if we are about to spend a
bunch of time optimizing it 🤔
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -256,6 +300,36 @@ impl CaseExpr {
Ok(ColumnarValue::Array(current_value))
}
+
+ /// This function evaluates the specialized case of:
+ ///
+ /// CASE WHEN condition THEN column
+ /// [ELSE NULL]
+ /// END
+ fn case_column_or_null(&self, batch: &RecordBatch) ->
Result<ColumnarValue> {
+ let when_expr = &self.when_then_expr[0].0;
+ let then_expr = &self.when_then_expr[0].1;
+ if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? {
+ let bit_mask = bit_mask
+ .as_any()
+ .downcast_ref::<BooleanArray>()
+ .expect("predicate should evaluate to a boolean array");
+ // invert the bitmask
+ let bit_mask = not(bit_mask)?;
+ match then_expr.evaluate(batch)? {
+ ColumnarValue::Array(array) => {
+ Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?))
+ }
+ ColumnarValue::Scalar(_) => Err(DataFusionError::Execution(
+ "expression did not evaluate to an array".to_string(),
+ )),
+ }
+ } else {
+ Err(DataFusionError::Execution(
+ "predicate did not evaluate to an array".to_string(),
Review Comment:
I agree with @viirya I think it would be fairly straightforward here to
handle `ColumnarValue::Scalar` s well. I don't think we need to do it in this PR
Given you say in
https://github.com/apache/datafusion/pull/11534/files#r1683081792
> I am planning on adding a specialization for scalar values as well to
avoid converting scalars to arrays
I think we could definitely do it in a follow on PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]