andygrove opened a new issue, #3181:
URL: https://github.com/apache/datafusion-comet/issues/3181

   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `left` function, causing queries 
using this function to fall back to Spark's JVM execution instead of running 
natively on DataFusion.
   
   The `Left` expression extracts a specified number of characters from the 
left side of a string or binary value. This expression is implemented as a 
`RuntimeReplaceable` that internally uses the `Substring` expression with a 
starting position of 1.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   LEFT(str, len)
   ```
   
   ```scala
   // DataFrame API
   import org.apache.spark.sql.functions._
   df.select(left(col("column_name"), 5))
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | str | String/Binary | The input string or binary data from which to 
extract characters |
   | len | Integer | The number of characters to extract from the left side |
   
   **Return Type:** Returns the same data type as the input `str` argument:
   
   - String input returns String
   - Binary input returns Binary
   
   **Supported Data Types:**
   - **String types**: All string types with collation support (specifically 
those supporting trim collation)
   - **Binary type**: Raw binary data
   
   **Edge Cases:**
   - **Null handling**: If either `str` or `len` is null, the result is null
   - **Negative length**: Behavior depends on underlying `Substring` 
implementation
   - **Length exceeds string**: Returns the entire string when `len` is greater 
than string length
   - **Zero length**: Returns empty string when `len` is 0
   - **Empty string input**: Returns empty string regardless of `len` value
   
   **Examples:**
   ```sql
   -- Extract first 3 characters
   SELECT LEFT('Apache Spark', 3); -- Returns 'Apa'
   
   -- With column reference
   SELECT LEFT(name, 5) FROM users;
   
   -- With binary data
   SELECT LEFT(CAST('binary_data' AS BINARY), 4);
   ```
   
   ```scala
   // DataFrame API examples
   import org.apache.spark.sql.functions._
   
   // Extract first 3 characters
   df.select(left(col("text_column"), 3))
   
   // Dynamic length based on another column
   df.select(left(col("description"), col("max_length")))
   
   // With literal string
   df.select(left(lit("Apache Spark"), 5))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Small
   **Spark Expression Class:** `org.apache.spark.sql.catalyst.expressions.Left`
   
   **Related:**
   - `Substring` - The underlying expression used for implementation
   - `Right` - Extracts characters from the right side of a string
   - `Mid`/`Substr` - General substring extraction with custom start position
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to