andygrove opened a new issue, #3181:
URL: https://github.com/apache/datafusion-comet/issues/3181
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `left` function, causing queries
using this function to fall back to Spark's JVM execution instead of running
natively on DataFusion.
The `Left` expression extracts a specified number of characters from the
left side of a string or binary value. This expression is implemented as a
`RuntimeReplaceable` that internally uses the `Substring` expression with a
starting position of 1.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
LEFT(str, len)
```
```scala
// DataFrame API
import org.apache.spark.sql.functions._
df.select(left(col("column_name"), 5))
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| str | String/Binary | The input string or binary data from which to
extract characters |
| len | Integer | The number of characters to extract from the left side |
**Return Type:** Returns the same data type as the input `str` argument:
- String input returns String
- Binary input returns Binary
**Supported Data Types:**
- **String types**: All string types with collation support (specifically
those supporting trim collation)
- **Binary type**: Raw binary data
**Edge Cases:**
- **Null handling**: If either `str` or `len` is null, the result is null
- **Negative length**: Behavior depends on underlying `Substring`
implementation
- **Length exceeds string**: Returns the entire string when `len` is greater
than string length
- **Zero length**: Returns empty string when `len` is 0
- **Empty string input**: Returns empty string regardless of `len` value
**Examples:**
```sql
-- Extract first 3 characters
SELECT LEFT('Apache Spark', 3); -- Returns 'Apa'
-- With column reference
SELECT LEFT(name, 5) FROM users;
-- With binary data
SELECT LEFT(CAST('binary_data' AS BINARY), 4);
```
```scala
// DataFrame API examples
import org.apache.spark.sql.functions._
// Extract first 3 characters
df.select(left(col("text_column"), 3))
// Dynamic length based on another column
df.select(left(col("description"), col("max_length")))
// With literal string
df.select(left(lit("Apache Spark"), 5))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Small
**Spark Expression Class:** `org.apache.spark.sql.catalyst.expressions.Left`
**Related:**
- `Substring` - The underlying expression used for implementation
- `Right` - Extracts characters from the right side of a string
- `Mid`/`Substr` - General substring extraction with custom start position
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]