[I] [Feature] Support Spark expression: decode [datafusion-comet]

via GitHub Thu, 15 Jan 2026 07:21:20 -0800


andygrove opened a new issue, #3184:
URL: https://github.com/apache/datafusion-comet/issues/3184


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `decode` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   The `Decode` expression implements a SQL CASE-like functionality that 
compares an input expression against multiple search values and returns the 
corresponding result value when a match is found. It serves as a runtime 
replaceable expression that gets transformed into a more optimized internal 
representation during query planning.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   DECODE(expr, search1, result1 [, search2, result2] ... [, default])
   ```
   
   ```scala
   // DataFrame API usage would be through SQL expression or functions
   df.selectExpr("DECODE(column_name, 'value1', 'result1', 'value2', 'result2', 
'default')")
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | expr | Expression | The input expression to be compared against search 
values |
   | search | Expression | Search value(s) to match against the input 
expression |
   | result | Expression | Result value(s) to return when corresponding search 
value matches |
   | default | Expression (Optional) | Default value to return when no search 
values match |
   
   **Return Type:** The return type is determined by the common type of all 
result expressions and the optional default value. The expression performs type 
coercion to find a compatible return type among all possible result values.
   
   **Supported Data Types:**
   The `Decode` expression supports all Spark SQL data types for input and 
comparison:
   
   - Primitive types (numeric, string, boolean, binary)
   - Complex types (array, map, struct)
   - Temporal types (date, timestamp)
   - Null types
   
   **Edge Cases:**
   - Null input expression matches only null search values using null-safe 
equality
   - Empty parameter list results in compilation error
   - Odd number of parameters (excluding first expression) uses the last 
parameter as default
   - Type mismatches between result expressions trigger type coercion to a 
common type
   - If type coercion fails, the expression may throw analysis exceptions
   
   **Examples:**
   ```sql
   -- Basic decode with default
   SELECT DECODE(status, 'A', 'Active', 'I', 'Inactive', 'Unknown') FROM users;
   
   -- Decode with numeric values
   SELECT DECODE(grade, 1, 'Poor', 2, 'Fair', 3, 'Good', 4, 'Excellent') FROM 
reviews;
   
   -- Decode without default (returns null for non-matches)
   SELECT DECODE(category, 'TECH', 'Technology', 'BIZ', 'Business') FROM 
articles;
   ```
   
   ```scala
   // Example DataFrame API usage
   import org.apache.spark.sql.functions.expr
   
   df.select(expr("DECODE(status_code, 200, 'OK', 404, 'Not Found', 500, 
'Error', 'Unknown')"))
   
   // Using with column references
   df.selectExpr("DECODE(department, 'ENG', 'Engineering', 'MKT', 'Marketing', 
'Other')")
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.Decode`
   
   **Related:**
   - `CaseWhen` - The underlying expression that `Decode` typically gets 
transformed into
   - `When` - For building conditional expressions in DataFrame API
   - `Coalesce` - For handling null values with fallback logic
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: decode [datafusion-comet]

Reply via email to